Submitted by csuhan 100 ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents · 7 authors 2.6k 4
Submitted by JingweiZuo 70 Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance · 27 authors 108 5
Submitted by kenchan0226 47 VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning · 12 authors 19 4
Submitted by eliebak 21 Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding · 199 authors 452 2
Submitted by tulvgengenr 19 MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE · 7 authors 1.11k 2
Submitted by xiaofanghf 11 Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision · 8 authors 11 3
Submitted by HenghuiDing 10 Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation · 4 authors 85 2
Submitted by akhadangi 9 Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning · 5 authors 4 2
Submitted by jahnsonblack 7 DreamScene: 3D Gaussian-based End-to-end Text-to-3D Scene Generation · 7 authors 228 2