Process Reward Models that Think -- https://arxiv.org/abs/2504.16828
AI & ML interests
Factuality, reasoning, alignment, LLM applications
Recent Activity
spaces 7
Running
LudoBench
🎲
Multimodal Game Reasoning Benchmark [ICLR 2026]
Running
Answer Convergence Early Stopping
🛑
Demo for EMNLP Paper "Answer Convergence as a Signal..."
Sleeping
FactRBench
🏆
View and analyze long-form factuality leaderboard
Running
3
ExpertLongBench
🚀
Leaderboard for ExpertLongBench
Sleeping
1
ManyICLBench
🚀
Leaderboard for ManyICLBench
Running
MLRC-BENCH
📊
Display model performance rankings
datasets 13
launch/LudoBench
Viewer
• Updated
• 638 • 2
launch/ExpertLongBench
Preview
• Updated
• 504 • 10
launch/thinkprm-1K-verification-cots
Viewer
• Updated
• 1k • 29 • 6
launch/ManyICLBench
Viewer
• Updated
• 66 • 387 • 1
launch/CMV
Viewer
• Updated
• 133 • 52
launch/FactRBench
Viewer
• Updated
• 1.06k • 61 • 1
launch/FactBench
Viewer
• Updated
• 1k • 101 • 3
launch/CLASH
Viewer
• Updated
• 345 • 32 • 4
launch/gov_report
Viewer
• Updated
• 58.4k • 288 • 10
launch/gov_report_qs
Viewer
• Updated
• 7.87k • 54 • 4