Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference Paper • 2405.18628 • Published May 28, 2024 • 1
FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization Paper • 2503.12649 • Published Mar 16, 2025 • 1
MobileQuant: Mobile-friendly Quantization for On-device Language Models Paper • 2408.13933 • Published Aug 25, 2024 • 16
MobileQuant: Mobile-friendly Quantization for On-device Language Models Paper • 2408.13933 • Published Aug 25, 2024 • 16
Augmenting CLIP with Improved Visio-Linguistic Reasoning Paper • 2307.09233 • Published Jul 18, 2023 • 9