VIDEOP2R: Video Understanding from Perception to Reasoning Paper • 2511.11113 • Published Nov 14 • 112
Mind-the-Glitch: Visual Correspondence for Detecting Inconsistencies in Subject-Driven Generation Paper • 2509.21989 • Published Sep 26 • 22
Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think Paper • 2504.20708 • Published Apr 29 • 23
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding Paper • 2503.17827 • Published Mar 22 • 8
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding Paper • 2503.17827 • Published Mar 22 • 8
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch Paper • 2406.14563 • Published Jun 20, 2024 • 30
Vivid-ZOO: Multi-View Video Generation with Diffusion Model Paper • 2406.08659 • Published Jun 12, 2024 • 8