Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding Paper • 2604.05015 • Published 7 days ago • 227
AURA: Always-On Understanding and Real-Time Assistance via Video Streams Paper • 2604.04184 • Published 8 days ago • 46
EvoTok: A Unified Image Tokenizer via Residual Latent Evolution for Visual Understanding and Generation Paper • 2603.12108 • Published Mar 12 • 8
GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing Paper • 2603.12264 • Published Mar 12 • 14
Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation Paper • 2603.12247 • Published Mar 12 • 23
Stepping VLMs onto the Court: Benchmarking Spatial Intelligence in Sports Paper • 2603.09896 • Published Mar 10 • 27
InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing Paper • 2603.09877 • Published Mar 10 • 48
RISE-Video: Can Video Generators Decode Implicit World Rules? Paper • 2602.05986 • Published Feb 5 • 27
Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform Paper • 2512.08478 • Published Dec 9, 2025 • 77