VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs Paper • 2603.23481 • Published 11 days ago • 6
CaptionQA: Is Your Caption as Useful as the Image Itself? Paper • 2511.21025 • Published Nov 26, 2025 • 28 • 3
view article Article 📌 Rethinking Multimodality from an Industry Perspective: Captioning Is Far More Important Than You Think Nov 29, 2025 • 3
view article Article 📌 Rethinking Multimodality from an Industry Perspective: Captioning Is Far More Important Than You Think Nov 29, 2025 • 3
HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision Language Models for Detailed Caption Paper • 2310.01779 • Published Oct 3, 2023 • 4
CORE-MM: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models Paper • 2311.11567 • Published Nov 20, 2023 • 8
ExCoT: Optimizing Reasoning for Text-to-SQL with Execution Feedback Paper • 2503.19988 • Published Mar 25, 2025
CaptionQA: Is Your Caption as Useful as the Image Itself? Paper • 2511.21025 • Published Nov 26, 2025 • 28
CaptionQA: Is Your Caption as Useful as the Image Itself? Paper • 2511.21025 • Published Nov 26, 2025 • 28