In-Context Reinforcement Learning for Tool Use in Large Language Models Paper • 2603.08068 • Published 19 days ago • 41
Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs Paper • 2502.12982 • Published Feb 18, 2025 • 19
ZeCO: Zero Communication Overhead Sequence Parallelism for Linear Attention Paper • 2507.01004 • Published Jul 1, 2025 • 10
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published Sep 26, 2025 • 70
Optimizing Anytime Reasoning via Budget Relative Policy Optimization Paper • 2505.13438 • Published May 19, 2025 • 36
PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization Paper • 2503.01328 • Published Mar 3, 2025 • 16
PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization Paper • 2503.01328 • Published Mar 3, 2025 • 16
PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization Paper • 2503.01328 • Published Mar 3, 2025 • 16 • 3
Running 3.75k The Ultra-Scale Playbook 🌌 3.75k The ultimate guide to training LLM on large GPU Clusters
🔱 Sailor2 Language Models Collection Sailing in South-East Asia with Inclusive Multilingual LLMs • 32 items • Updated 26 days ago • 30