XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models Paper • 2411.15100 • Published Nov 22, 2024 • 6
Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity Paper • 2502.01776 • Published Feb 3 • 3
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation Paper • 2505.18875 • Published May 24 • 42
NanoFlow: Towards Optimal Large Language Model Serving Throughput Paper • 2408.12757 • Published Aug 22, 2024 • 19
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving Paper • 2310.19102 • Published Oct 29, 2023 • 11