Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance Paper β’ 2512.08765 β’ Published 3 days ago β’ 116
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length Paper β’ 2512.04677 β’ Published 8 days ago β’ 166
MedSAM3: Delving into Segment Anything with Medical Concepts Paper β’ 2511.19046 β’ Published 18 days ago β’ 48
Insights from the ICLR Peer Review and Rebuttal Process Paper β’ 2511.15462 β’ Published 23 days ago β’ 6
Depth Anything 3: Recovering the Visual Space from Any Views Paper β’ 2511.10647 β’ Published 29 days ago β’ 93
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model Paper β’ 2510.14528 β’ Published Oct 16 β’ 106
Agent Lightning: Train ANY AI Agents with Reinforcement Learning Paper β’ 2508.03680 β’ Published Aug 5 β’ 121
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper β’ 2510.23607 β’ Published Oct 27 β’ 174
VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos Paper β’ 2510.19488 β’ Published Oct 22 β’ 19
DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile Phone Agents Paper β’ 2510.19336 β’ Published Oct 22 β’ 16
Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset Paper β’ 2510.15742 β’ Published Oct 17 β’ 50
ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints Paper β’ 2510.14847 β’ Published Oct 16 β’ 55
WithAnyone: Towards Controllable and ID Consistent Image Generation Paper β’ 2510.14975 β’ Published Oct 16 β’ 84
Diffusion Transformers with Representation Autoencoders Paper β’ 2510.11690 β’ Published Oct 13 β’ 165
InfiniHuman: Infinite 3D Human Creation with Precise Control Paper β’ 2510.11650 β’ Published Oct 13 β’ 5