ReviewScore: Misinformed Peer Review Detection with Large Language Models Paper • 2509.21679 • Published Sep 25 • 63
Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation Paper • 2505.18842 • Published May 24 • 36
VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms Paper • 2503.14427 • Published Mar 18 • 19
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics Paper • 2406.14703 • Published Jun 20, 2024 • 2