pliu23 commited on
Commit
5aee2af
·
verified ·
1 Parent(s): e70c92c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +101 -0
README.md CHANGED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/gesturelsm-latent-shortcut-based-co-speech/gesture-generation-on-beat2)](https://paperswithcode.com/sota/gesture-generation-on-beat2?p=gesturelsm-latent-shortcut-based-co-speech) <a href="https://arxiv.org/abs/2501.18898"><img src="https://img.shields.io/badge/arxiv-gray?logo=arxiv&amp"></a>
2
+
3
+
4
+
5
+ # GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling
6
+
7
+
8
+ # 📝 Release Plans
9
+
10
+ - [x] Inference Code
11
+ - [x] Pretrained Models
12
+ - [x] A web demo
13
+ - [x] Training Code
14
+
15
+ # ⚒️ Installation
16
+
17
+ ## Build Environtment
18
+
19
+ ```
20
+ conda create -n gesturelsm python=3.12
21
+ conda activate gesturelsm
22
+ conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=11.8 -c pytorch -c nvidia
23
+ pip install -r requirements.txt
24
+ bash demo/install_mfa.sh
25
+ ```
26
+
27
+ ## Download Model
28
+ ```
29
+ # Download the pretrained model (Shortcut) + (Shortcut-reflow) + (Diffusion) + (RVQ-VAEs)
30
+ gdown https://drive.google.com/drive/folders/1OfYWWJbaXal6q7LttQlYKWAy0KTwkPRw?usp=drive_link -O ./ckpt --folder
31
+
32
+ # Download the SMPL model
33
+ gdown https://drive.google.com/drive/folders/1MCks7CMNBtAzU2XihYezNmiGT_6pWex8?usp=drive_link -O ./datasets/hub --folder
34
+ ```
35
+
36
+ ## Download Dataset
37
+ > For evaluation and training, not necessary for running a web demo or inference.
38
+
39
+ - Download the original raw data
40
+ ```
41
+ bash preprocess/bash_raw_cospeech_download.sh
42
+ ```
43
+
44
+ ## Eval
45
+ > Require download dataset
46
+ ```
47
+ # Evaluate the pretrained shortcut model (20 steps)
48
+ python test.py -c configs/shortcut_rvqvae_128.yaml
49
+
50
+ # Evaluate the pretrained shortcut-reflow model (2-step)
51
+ python test.py -c configs/shortcut_reflow_test.yaml
52
+
53
+ # Evaluate the pretrained diffusion model
54
+ python test.py -c configs/diffuser_rvqvae_128.yaml
55
+
56
+ ```
57
+
58
+ ## Train RVQ-VAEs
59
+ > Require download dataset
60
+ ```
61
+ bash train_rvq.sh
62
+ ```
63
+
64
+ ## Train Generator
65
+ > Require download dataset
66
+ ```
67
+
68
+ # Train the shortcut model
69
+ python train.py -c configs/shortcut_rvqvae_128.yaml
70
+
71
+ # Train the diffusion model
72
+ python train.py -c configs/diffuser_rvqvae_128.yaml
73
+ ```
74
+
75
+
76
+ ## Demo
77
+ ```
78
+ python demo.py -c configs/shortcut_rvqvae_128_hf.yaml
79
+ ```
80
+
81
+
82
+
83
+ # 🙏 Acknowledgments
84
+ Thanks to [SynTalker](https://github.com/RobinWitch/SynTalker/tree/main), [EMAGE](https://github.com/PantoMatrix/PantoMatrix/tree/main/scripts/EMAGE_2024), [DiffuseStyleGesture](https://github.com/YoungSeng/DiffuseStyleGesture), our code is partially borrowing from them. Please check these useful repos.
85
+
86
+
87
+ # 📖 Citation
88
+
89
+ If you find our code or paper helps, please consider citing:
90
+
91
+ ```bibtex
92
+ @misc{liu2025gesturelsmlatentshortcutbased,
93
+ title={GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling},
94
+ author={Pinxin Liu and Luchuan Song and Junhua Huang and Chenliang Xu},
95
+ year={2025},
96
+ eprint={2501.18898},
97
+ archivePrefix={arXiv},
98
+ primaryClass={cs.CV},
99
+ url={https://arxiv.org/abs/2501.18898},
100
+ }
101
+ ```