Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published Oct 30, 2025 • 119
StateX: Enhancing RNN Recall via Post-training State Expansion Paper • 2509.22630 • Published Sep 26, 2025 • 3
BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity Paper • 2507.08771 • Published Jul 11, 2025 • 9
Cost-Optimal Grouped-Query Attention for Long-Context LLMs Paper • 2503.09579 • Published Mar 12, 2025 • 5
MARS: Unleashing the Power of Variance Reduction for Training Large Models Paper • 2411.10438 • Published Nov 15, 2024 • 13
Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling Paper • 2410.07145 • Published Oct 9, 2024 • 2
view article Article A failed experiment: Infini-Attention, and why we should keep trying? +1 Aug 14, 2024 • 74
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models Paper • 2406.15718 • Published Jun 22, 2024 • 14