allenai/Olmo-3.1-32B-Think
Text Generation
•
32B
•
Updated
•
366
•
9
The latest members of the Olmo 3 family: another 3 weeks of RL for 32B Think, the 32B Instruct model, large post-training research datasets...
Note 📈 Scaling RL to make our latest model.
Note 💨 Our best model yet for chat & sensitive tasks.
Note 🧮 Improved RL Zero performance & more training steps!
Note 💻 Improved RL Zero performance & more training steps!
Note Large datasets of completions used to filter prompts for our RL runs.
Note A very large set of completions across many models for preference tuning and reward modeling research.