U-ARM : Ultra low-cost general teleoperation interface for robot manipulation Paper • 2509.02437 • Published Sep 2, 2025 • 5
Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising Paper • 2511.08633 • Published Nov 9, 2025 • 54
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Paper • 2511.08892 • Published Nov 12, 2025 • 201
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Paper • 2511.02778 • Published Nov 4, 2025 • 101
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published Nov 9, 2025 • 132
The Station: An Open-World Environment for AI-Driven Discovery Paper • 2511.06309 • Published Nov 9, 2025 • 36
Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models Paper • 2511.01618 • Published Nov 3, 2025 • 10
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published Oct 13, 2025 • 176
Artificial Hippocampus Networks for Efficient Long-Context Modeling Paper • 2510.07318 • Published Oct 8, 2025 • 30
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention Paper • 2510.04212 • Published Oct 5, 2025 • 23
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models Paper • 2510.01623 • Published Oct 2, 2025 • 10
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain Paper • 2509.26507 • Published Sep 30, 2025 • 538
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26, 2025 • 139