Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents Paper • 2505.24878 • Published May 30, 2025 • 23
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style Paper • 2410.16184 • Published Oct 21, 2024 • 25
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs Paper • 2410.16144 • Published Oct 21, 2024 • 5
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning Paper • 2312.14878 • Published Dec 22, 2023 • 15
EcoAssistant: Using LLM Assistant More Affordably and Accurately Paper • 2310.03046 • Published Oct 3, 2023 • 6