LUFFY-RL Elliott/LUFFY-Qwen-Math-7B-Zero Text Generation • 8B • Updated Apr 23, 2025 • 264 • 1 Elliott/Qwen2.5-Math-7B-16k-think Text Generation • 8B • Updated May 28, 2025 • 945 • • 3 Elliott/Openr1-Math-46k-8192 Viewer • Updated Apr 23, 2025 • 45.8k • 535 • 8 Learning to Reason under Off-Policy Guidance Paper • 2504.14945 • Published Apr 21, 2025 • 88
LUFFY-RL Elliott/LUFFY-Qwen-Math-7B-Zero Text Generation • 8B • Updated Apr 23, 2025 • 264 • 1 Elliott/Qwen2.5-Math-7B-16k-think Text Generation • 8B • Updated May 28, 2025 • 945 • • 3 Elliott/Openr1-Math-46k-8192 Viewer • Updated Apr 23, 2025 • 45.8k • 535 • 8 Learning to Reason under Off-Policy Guidance Paper • 2504.14945 • Published Apr 21, 2025 • 88