Search-R3: Unifying Reasoning and Embedding Generation in Large Language Models Paper • 2510.07048 • Published Oct 8, 2025 • 4
facebook/dinov3-vitb16-pretrain-lvd1689m Image Feature Extraction • 85.7M • Updated Aug 19, 2025 • 228k • 91
view article Article TimeScope: How Long Can Your Video Large Multimodal Model Go? +2 Jul 23, 2025 • 46
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published Apr 7, 2025 • 202
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research Paper • 2503.13399 • Published Mar 17, 2025 • 22
Temporal Preference Optimization Collection Temporal Preference Optimization for Long-form Video Understanding • 3 items • Updated Jan 19, 2025 • 5
Temporal Preference Optimization for Long-Form Video Understanding Paper • 2501.13919 • Published Jan 23, 2025 • 23
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Paper • 2501.07171 • Published Jan 13, 2025 • 55
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation Paper • 2501.03225 • Published Jan 6, 2025 • 7
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Paper • 2501.07171 • Published Jan 13, 2025 • 55
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 147
Revisiting Active Learning in the Era of Vision Foundation Models Paper • 2401.14555 • Published Jan 25, 2024