HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading Paper • 2502.12574 • Published Feb 18, 2025 • 13
COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models Paper • 2305.17235 • Published May 26, 2023 • 2