CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding Paper • 2602.01785 • Published 2 days ago • 50
TTCS: Test-Time Curriculum Synthesis for Self-Evolving Paper • 2601.22628 • Published 5 days ago • 31
Green-VLA: Staged Vision-Language-Action Model for Generalist Robots Paper • 2602.00919 • Published 3 days ago • 182
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models Paper • 2602.02185 • Published 1 day ago • 116
Closing the Loop: Universal Repository Representation with RPG-Encoder Paper • 2602.02084 • Published 2 days ago • 78
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published 6 days ago • 137
The Script is All You Need: An Agentic Framework for Long-Horizon Dialogue-to-Cinematic Video Generation Paper • 2601.17737 • Published 10 days ago • 55
PII & De-Identification Collection Models for extracting PII entities and de-identifying clinical text, with support for HIPAA and GDPR compliance. • 33 items • Updated 22 days ago • 29
MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods Paper • 2601.21821 • Published 6 days ago • 55
Skywork-Unipic3 Collection Unified Multi-Image Composition with Sequence Modeling • 7 items • Updated about 6 hours ago • 10