InteractComp: Evaluating Search Agents With Ambiguous Queries Paper • 2510.24668 • Published Oct 28, 2025 • 97
FlowRL: Matching Reward Distributions for LLM Reasoning Paper • 2509.15207 • Published Sep 18, 2025 • 114
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning Paper • 2505.24726 • Published May 30, 2025 • 277
Facilitating large language model Russian adaptation with Learned Embedding Propagation Paper • 2412.21140 • Published Dec 30, 2024 • 18
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? Paper • 2411.16489 • Published Nov 25, 2024 • 45
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Paper • 2405.21060 • Published May 31, 2024 • 68
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Paper • 2404.12253 • Published Apr 18, 2024 • 55
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU Paper • 2403.06504 • Published Mar 11, 2024 • 56
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6, 2024 • 189
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models Paper • 2402.19427 • Published Feb 29, 2024 • 56
Beyond Language Models: Byte Models are Digital World Simulators Paper • 2402.19155 • Published Feb 29, 2024 • 53
StarCoder 2 and The Stack v2: The Next Generation Paper • 2402.19173 • Published Feb 29, 2024 • 152