DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation Paper • 2511.06307 • Published Nov 9, 2025 • 52
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward Paper • 2510.03222 • Published Oct 3, 2025 • 75
view article Article ZebraLogic: Benchmarking the Logical Reasoning Ability of Language Models Jul 27, 2024 • 34