EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published Sep 26, 2025 • 134
FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning Paper • 2504.00487 • Published Apr 1, 2025 • 18
SimNPO-Unlearned Models Collection This collection hosts the SimNPO-unlearned models over TOFU, MUSE, and WMDP unlearning benchmarks. • 7 items • Updated Aug 8, 2025 • 2