Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models Paper • 2504.02273 • Published Apr 3, 2025 • 7
ROOT: Robust Orthogonalized Optimizer for Neural Network Training Paper • 2511.20626 • Published Nov 25, 2025 • 43
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published Oct 30, 2025 • 119
Sparse Query Attention (SQA): A Computationally Efficient Attention Mechanism with Query Heads Reduction Paper • 2510.01817 • Published Oct 2, 2025 • 15
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning Paper • 2508.18756 • Published Aug 26, 2025 • 36
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning Paper • 2505.17667 • Published May 23, 2025 • 88
Achieving Tokenizer Flexibility in Language Models through Heuristic Adaptation and Supertoken Learning Paper • 2505.09738 • Published May 14, 2025 • 10
IndicTTS Datasets Collection Datasets derived from the Indic TTS Database, a special corpus of Indian languages developed by the Speech Technology Consortium at IIT Madras. • 13 items • Updated Mar 6, 2025 • 13
Adapting Multilingual LLMs to Low-Resource Languages using Continued Pre-training and Synthetic Corpus Paper • 2410.14815 • Published Oct 18, 2024 • 1
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published Dec 18, 2024 • 158
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated Dec 19, 2024 • 152
Falcon3 Collection Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters. • 40 items • Updated Nov 6, 2025 • 87
MiniPLM Collection Pre-trained models in MiniPLM: Knowledge Distillation for Pre-Training Language Models • 5 items • Updated Oct 21, 2024 • 2
MiniPLM: Knowledge Distillation for Pre-Training Language Models Paper • 2410.17215 • Published Oct 22, 2024 • 16
Structured 3D Latents for Scalable and Versatile 3D Generation Paper • 2412.01506 • Published Dec 2, 2024 • 84