Region-Constraint In-Context Generation for Instructional Video Editing Paper • 2512.17650 • Published 17 days ago • 50
AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning Paper • 2511.19304 • Published Nov 24, 2025 • 90
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe Paper • 2511.16334 • Published Nov 20, 2025 • 92
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published Nov 9, 2025 • 132
The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment Paper • 2511.20614 • Published Nov 25, 2025 • 37
VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos Paper • 2505.23693 • Published May 29, 2025 • 53
RAFT: Adapting Language Model to Domain Specific RAG Paper • 2403.10131 • Published Mar 15, 2024 • 72
Uni-SMART: Universal Science Multimodal Analysis and Research Transformer Paper • 2403.10301 • Published Mar 15, 2024 • 54
Veagle: Advancements in Multimodal Representation Learning Paper • 2403.08773 • Published Jan 18, 2024 • 10
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding Paper • 2403.09530 • Published Mar 14, 2024 • 10
3D-VLA: A 3D Vision-Language-Action Generative World Model Paper • 2403.09631 • Published Mar 14, 2024 • 11
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences Paper • 2403.09347 • Published Mar 14, 2024 • 22
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published Feb 13, 2025 • 191