Papers - Video
updated
Video as the New Language for Real-World Decision Making
Paper
• 2402.17139
• Published
• 22
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
Paper
• 2310.19512
• Published
• 16
VideoMamba: State Space Model for Efficient Video Understanding
Paper
• 2403.06977
• Published
• 29
VideoCrafter2: Overcoming Data Limitations for High-Quality Video
Diffusion Models
Paper
• 2401.09047
• Published
• 14
V3D: Video Diffusion Models are Effective 3D Generators
Paper
• 2403.06738
• Published
• 30
DragAnything: Motion Control for Anything using Entity Representation
Paper
• 2403.07420
• Published
• 14
BLIP: Bootstrapping Language-Image Pre-training for Unified
Vision-Language Understanding and Generation
Paper
• 2201.12086
• Published
• 3
Video Editing via Factorized Diffusion Distillation
Paper
• 2403.09334
• Published
• 22
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision
Understanding
Paper
• 2403.09530
• Published
• 10
3D-VLA: A 3D Vision-Language-Action Generative World Model
Paper
• 2403.09631
• Published
• 12
Generic 3D Diffusion Adapter Using Controlled Multi-View Editing
Paper
• 2403.12032
• Published
• 15
Vid2Robot: End-to-end Video-conditioned Policy Learning with
Cross-Attention Transformers
Paper
• 2403.12943
• Published
• 15
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
Paper
• 2403.12962
• Published
• 9
Efficient Video Diffusion Models via Content-Frame Motion-Latent
Decomposition
Paper
• 2403.14148
• Published
• 21
VidToMe: Video Token Merging for Zero-Shot Video Editing
Paper
• 2312.10656
• Published
• 11
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation
from Text
Paper
• 2403.14773
• Published
• 11
TC4D: Trajectory-Conditioned Text-to-4D Generation
Paper
• 2403.17920
• Published
• 18
Improving Automatic VQA Evaluation Using Large Language Models
Paper
• 2310.02567
• Published
• 4
Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D
Gaussians
Paper
• 2403.17898
• Published
• 17
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper
• 2401.12945
• Published
• 87
Garment3DGen: 3D Garment Stylization and Texture Generation
Paper
• 2403.18816
• Published
• 25
Zero-shot Prompt-based Video Encoder for Surgical Gesture Recognition
Paper
• 2403.19786
• Published
• 2
CameraCtrl: Enabling Camera Control for Text-to-Video Generation
Paper
• 2404.02101
• Published
• 24
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
Prediction
Paper
• 2404.02905
• Published
• 74
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with
Interleaved Visual-Textual Tokens
Paper
• 2404.03413
• Published
• 27
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse
Controls to Any Diffusion Model
Paper
• 2404.09967
• Published
• 21
Dynamic Typography: Bringing Words to Life
Paper
• 2404.11614
• Published
• 46
Pegasus-v1 Technical Report
Paper
• 2404.14687
• Published
• 33
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Paper
• 2404.14507
• Published
• 23
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video
Dense Captioning
Paper
• 2404.16994
• Published
• 37
Capabilities of Gemini Models in Medicine
Paper
• 2404.18416
• Published
• 25
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video
Generation
Paper
• 2405.01434
• Published
• 56
Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering
for HDR View Synthesis
Paper
• 2406.06216
• Published
• 23
EvTexture: Event-driven Texture Enhancement for Video Super-Resolution
Paper
• 2406.13457
• Published
• 17
What Matters in Detecting AI-Generated Videos like Sora?
Paper
• 2406.19568
• Published
• 15
Movie Gen: A Cast of Media Foundation Models
Paper
• 2410.13720
• Published
• 100
Adaptive Caching for Faster Video Generation with Diffusion Transformers
Paper
• 2411.02397
• Published
• 23
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models
Paper
• 2411.18613
• Published
• 59
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper
• 2412.10360
• Published
• 147
HunyuanVideo: A Systematic Framework For Large Video Generative Models
Paper
• 2412.03603
• Published
• 11
Cosmos World Foundation Model Platform for Physical AI
Paper
• 2501.03575
• Published
• 82