Image-to-Video
English
jamesliu1217 commited on
Commit
006bcaa
·
verified ·
1 Parent(s): 786f6f2

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +208 -0
README.md ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+
3
+ <p align="center">
4
+ <img src="./assets/logo.png" width="200px" alt="Live Avatar Teaser">
5
+ </p>
6
+
7
+ <h1>🎬 Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length</h1>
8
+ <!-- <h3>The code will be open source in <strong><span style="color: #87CEEB;">early December</span></strong>.</h3> -->
9
+
10
+
11
+ <p>
12
+ <a href="https://github.com/Yubo-Shankui" style="color: inherit;">Yubo Huang</a><sup>1,2</sup> ·
13
+ <a href="#" style="color: inherit;">Hailong Guo</a><sup>1,3</sup> ·
14
+ <a href="#" style="color: inherit;">Fangtai Wu</a><sup>1,4</sup> ·
15
+ <a href="#" style="color: inherit;">Shifeng Zhang</a><sup>1</sup> ·
16
+ <a href="#" style="color: inherit;">Shijie Huang</a><sup>1</sup> ·
17
+ <a href="#" style="color: inherit;">Qijun Gan</a><sup>4</sup> ·
18
+ <a href="#" style="color: inherit;">Lin Liu</a><sup>2</sup> ·
19
+ <a href="#" style="color: inherit;">Sirui Zhao</a><sup>2,*</sup> ·
20
+ <a href="http://staff.ustc.edu.cn/~cheneh/" style="color: inherit;">Enhong Chen</a><sup>2,*</sup> ·
21
+ <a href="https://openreview.net/profile?id=%7EJiaming_Liu7" style="color: inherit;">Jiaming Liu</a><sup>1,‡</sup> ·
22
+ <a href="https://sites.google.com/view/stevenhoi/" style="color: inherit;">Steven Hoi</a><sup>1</sup>
23
+ </p>
24
+
25
+ <p style="font-size: 0.9em;">
26
+ <sup>1</sup> Alibaba Group &nbsp;&nbsp;
27
+ <sup>2</sup> University of Science and Technology of China &nbsp;&nbsp;
28
+ <sup>3</sup> Beijing University of Posts and Telecommunications &nbsp;&nbsp;
29
+ <sup>4</sup> Zhejiang University
30
+ </p>
31
+
32
+ <p style="font-size: 0.9em;">
33
+ <sup>*</sup> Corresponding authors. &nbsp;&nbsp; <sup>‡</sup> Project leader.
34
+ </p>
35
+
36
+ <!-- Badges -->
37
+ <a href="https://arxiv.org/abs/2512.04677"><img src="https://img.shields.io/badge/arXiv-2512.04677-b31b1b.svg?style=for-the-badge" alt="arXiv"></a> <a href="https://huggingface.co/papers/2512.04677"><img src="https://img.shields.io/badge/🤗%20Daily%20Paper-ff9d00?style=for-the-badge" alt="Daily Paper"></a> <a href="https://huggingface.co/Quark-Vision/Live-Avatar"><img src="https://img.shields.io/badge/Hugging%20Face-Model-ffbd45?style=for-the-badge&logo=huggingface&logoColor=white" alt="HuggingFace"></a> <a href="https://github.com/Alibaba-Quark/LiveAvatar"><img src="https://img.shields.io/badge/Github-Code-black?style=for-the-badge&logo=github" alt="Github"></a> <a href="https://liveavatar.github.io/"><img src="https://img.shields.io/badge/Project-Page-blue?style=for-the-badge&logo=googlechrome&logoColor=white" alt="Project Page"></a>
38
+
39
+ </div>
40
+
41
+ > **TL;DR:** **Live Avatar** is an algorithm–system co-designed framework that enables real-time, streaming, infinite-length interactive avatar video generation. Powered by a **14B-parameter** diffusion model, it achieves **20 FPS** on **5×H800** GPUs with **4-step** sampling and supports **Block-wise Autoregressive** processing for **10,000+** second streaming videos.
42
+
43
+ <div align="center">
44
+
45
+ [![Watch the video](assets/demo.png)](https://www.youtube.com/watch?v=srbsGlLNpAc)
46
+
47
+ <strong>👀 More Demos:</strong> <br>
48
+ :robot: Human-AI Conversation &nbsp;|&nbsp; ♾️ Infinite Video &nbsp;|&nbsp; 🎭 Diverse Characters &nbsp;|&nbsp; 🎬 Animated Tech Explanation <br>
49
+ <a href="https://liveavatar.github.io/">
50
+ <strong>👉 Click Here to Visit Project Page! 🌐</strong>
51
+ </a>
52
+ <br>
53
+
54
+ </div>
55
+
56
+ ---
57
+ ## ✨ Highlights
58
+
59
+ > - ⚡ **​​Real-time Streaming Interaction**​​ - Achieve **20** FPS real-time streaming with low latency
60
+ > - ♾️ ​​**​​Infinite-length Autoregressive Generation**​​​​ - Support **10,000+** second continuous video generation
61
+ > - 🎨 ​​**​​Generalization Performances**​​​​ - Strong generalization across cartoon characters, singing, and diverse scenarios
62
+
63
+
64
+ ---
65
+ ## 📰 News
66
+ - **[2025.12.08]** 🚀 We released real-time inference [Code](infinite_inference_multi_gpu.sh) and the model [Weight](https://huggingface.co/Quark-Vision/Live-Avatar).
67
+ - **[2025.12.08]** 🎉 LiveAvatar won the Hugging Face [#1 Paper of the day](https://huggingface.co/papers/date/2025-12-05)!
68
+ - **[2025.12.04]** 🏃‍♂️ We committed to open-sourcing the code in **early December**.
69
+ - **[2025.12.04]** 🔥 We released [Paper](https://arxiv.org/abs/2512.04677) and [demo page](https://liveavatar.github.io/) Website.
70
+
71
+ ---
72
+
73
+ ## 📑 Todo List
74
+
75
+ ### 🌟 **Early December** (core code release)
76
+
77
+ - ✅ Release the paper
78
+ - ✅ Release the demo website
79
+ - ✅ Release checkpoints on Hugging Face
80
+ - ✅ Release Gradio Web UI
81
+ - ✅ Experimental real-time streaming inference on at least H800 GPUs
82
+ - ✅ Distribution-matching distillation to 4 steps
83
+ - ✅ Timestep-forcing pipeline parallelism
84
+
85
+ ### ⚙️ **Later updates**
86
+
87
+ - ⬜ UI integration for easily streaming interaction
88
+ - ⬜ Inference code supporting single GPU (offline generation)
89
+ - ⬜ Multi-character support
90
+ - ⬜ Training code
91
+ - ⬜ TTS integration
92
+ - ⬜ LiveAvatar v1.1
93
+
94
+ ## 🛠️ Installation
95
+
96
+ Please follow the steps below to set up the environment.
97
+
98
+ ### 1. Create Environment
99
+ ```bash
100
+ conda create -n liveavatar python=3.10 -y
101
+ conda activate liveavatar
102
+ ```
103
+
104
+ ### 2. Install CUDA Dependencies (optional)
105
+ ```bash
106
+ conda install nvidia/label/cuda-12.4.1::cuda -y
107
+ conda install -c nvidia/label/cuda-12.4.1 cudatoolkit -y
108
+ ```
109
+
110
+ ### 3. Install PyTorch & Flash Attention
111
+ ```bash
112
+ pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu128
113
+ pip install flash-attn==2.8.3 --no-build-isolation
114
+ ```
115
+
116
+ ### 4. Install Python Requirements
117
+ ```bash
118
+ pip install -r requirements.txt
119
+ ```
120
+ ### 5. Install FFMPEG
121
+ ```bash
122
+ apt-get update && apt-get install -y ffmpeg
123
+ ```
124
+
125
+ ---
126
+
127
+ ## 📥 Download Models
128
+
129
+ Please download the pretrained checkpoints from links below and place them in the `./ckpt/` directory.
130
+
131
+ | Model Component | Description | Link |
132
+ | :--- | :--- | :---: |
133
+ | `WanS2V-14B` | base model| 🤗 [Huggingface](https://huggingface.co/Wan-AI/Wan2.2-S2V-14B) |
134
+ | `liveAvatar` | our lora model| 🤗 [Huggingface](https://huggingface.co/Quark-Vision/Live-Avatar) |
135
+ ```bash
136
+ # If you are in china mainland, run this first: export HF_ENDPOINT=https://hf-mirror.com
137
+ pip install "huggingface_hub[cli]"
138
+ huggingface-cli download Wan-AI/Wan2.2-S2V-14B --local-dir ./ckpt/Wan2.2-S2V-14B
139
+ huggingface-cli download Quark-Vision/Live-Avatar --local-dir ./ckpt/LiveAvatar
140
+ ```
141
+
142
+ After downloading, your directory structure should look like this:
143
+
144
+ ```
145
+ ckpt/
146
+ ├── Wan2.2-S2V-14B/ # Base model
147
+ │ ├── config.json
148
+ │ ├── diffusion_pytorch_model-*.safetensors
149
+ │ └── ...
150
+ └── LiveAvatar/ # Our LoRA model
151
+ ├── liveavatar.safetensors
152
+ └── ...
153
+ ```
154
+
155
+
156
+
157
+ ## 🚀 Inference
158
+ ### Real-time Inference with TPP
159
+ > 💡 Currently, This command can run on GPUs with at least 80GB VRAM.
160
+ ```bash
161
+ # CLI Inference
162
+ bash infinite_inference_multi_gpu.sh
163
+ # Gradio Web UI
164
+ bash gradio_multi_gpu.sh
165
+ ```
166
+ > 💡 The model can generate videos from audio input combined with reference image and optional text prompt.
167
+
168
+ > 💡 The `size` parameter represents the area of the generated video, with the aspect ratio following that of the original input image.
169
+
170
+ > 💡 The `--num_clip` parameter controls the number of video clips generated, useful for quick preview with shorter generation time.
171
+
172
+ > 💡 Currently, our TPP pipeline requires **five** GPUs for inference. We are planning to develop a 3-step version that can be deployed on a 4-GPU cluster.
173
+ Furthermore, we are planning to integrate the [LightX2V](https://github.com/ModelTC/LightX2V) VAE component. This integration will eliminate the dependency on additional single-GPU VAE parallelism and support 4-step inference within a 4-GPU setup.
174
+
175
+ Please visit our [project page](https://liveavatar.github.io/) to see more examples and learn about the scenarios suitable for this model.
176
+ ## 📝 Citation
177
+
178
+ If you find this project useful for your research, please consider citing our paper:
179
+
180
+ ```bibtex
181
+ @misc{huang2025liveavatarstreamingrealtime,
182
+ title={Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length},
183
+ author={Yubo Huang and Hailong Guo and Fangtai Wu and Shifeng Zhang and Shijie Huang and Qijun Gan and Lin Liu and Sirui Zhao and Enhong Chen and Jiaming Liu and Steven Hoi},
184
+ year={2025},
185
+ eprint={2512.04677},
186
+ archivePrefix={arXiv},
187
+ primaryClass={cs.CV},
188
+ url={https://arxiv.org/abs/2512.04677},
189
+ }
190
+ ```
191
+ ## ⭐ Star History
192
+
193
+ [![Star History Chart](https://api.star-history.com/svg?repos=Alibaba-Quark/LiveAvatar&type=date&legend=top-left)](https://www.star-history.com/#Alibaba-Quark/LiveAvatar&type=date&legend=top-left)
194
+
195
+ ## 📜 License Agreement
196
+ * The majority of this project is released under the Apache 2.0 license as found in the [LICENSE](LICENSE).
197
+ * The Wan model (Our base model) is also released under the Apache 2.0 license as found in the [LICENSE](https://github.com/Wan-Video/Wan2.2/blob/main/LICENSE.txt).
198
+ * The project is a research preview. Please contact us if you find any potential violations. ([email protected])
199
+
200
+
201
+
202
+ ## 🙏 Acknowledgements
203
+
204
+ We would like to express our gratitude to the following projects:
205
+
206
+ * [CausVid](https://github.com/tianweiy/CausVid)
207
+ * [Longlive](https://github.com/NVlabs/LongLive)
208
+ * [WanS2V](https://humanaigc.github.io/wan-s2v-webpage/)