LiveAct Logo

SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory

Dingcheng Zhen*โœ‰ ยท Xu Zheng* ยท Ruixin Zhang* ยท Zhiqi Jiang*

Yichao Yan ยท Ming Tao ยท Shunshun Yin

SoulX-LiveAct presents a novel framework that enables lifelike, multimodal-controlled, high-fidelity human animation video generation for real-time streaming interactions.

(I) We identify diffusion-step-aligned neighbor latents as a key inductive bias for AR diffusion, providing a principled and theoretically grounded Neighbor Forcing for step-consistent AR video generation.

(II) We introduce ConvKV Memory, a lightweight plug-in compression mechanism that enables constant-memory hour-scale video generation with negligible overhead.

(III) We develop an optimized real-time system that achieves 20 FPS using only two H100/H200 GPUs with end-end adaptive FP8 precision, sequence parallelism, and operator fusion at 720ร—416 or 512ร—512 resolution.

๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ News

  • ๐Ÿ“ข Mar 18, 2026: We now support consumer GPUs (e.g., RTX 4090, RTX 5090) with FP8 KV cache and CPU model offloading. In our tests, the 18B model (14B Wan2.1 + 4B audio module) achieves a throughput of 6 FPS on a single RTX 5090.
  • ๐Ÿ‘‹ Mar 16, 2026: We release the inference code and model weights of SoulX-LiveAct.

๐ŸŽฅ Demo

๐Ÿ‘ซ Podcast

๐ŸŽค Music & Talk Show

๐Ÿ“ฑ FaceTime

๐Ÿ“‘ Open-source Plan

  • Release inference code and checkpoints
  • GUI demo Support
  • End-end adaptive FP8 precision
  • Support model offloading for consumer GPUs (e.g., RTX 4090, RTX 5090) to reduce memory usage
  • Support FP4 precision for B-series GPUs (e.g., RTX 5090, B100, B200)
  • Release training code

โ–ถ๏ธ Quick Start

๐Ÿ› ๏ธ Dependencies and Installation

Step 1: Install Basic Dependencies

conda create -n liveact python=3.10
conda activate liveact
pip install -r requirements.txt
conda install conda-forge::sox -y

Step 2: Install SageAttention

To enable fp8 attention kernel, you need to install SageAttention:

  • Install SageAttention:

    git clone https://github.com/thu-ml/SageAttention.git
    cd SageAttention
    git checkout v2.2.0
    python setup.py install
    
  • (Optional) Install the modified version of SageAttention: To enable SageAttention for QKV's operator fusion, you need to install it by the following command:

    git clone https://github.com/ZhiqiJiang/SageAttentionFusion.git
    cd SageAttentionFusion
    python setup.py install
    

Step 3: Install vllm:

To enable fp8 gemm kernel, you need to install vllm:

pip install vllm==0.11.0

Step 4 Install LightVAE:๏ผš

git clone https://github.com/ModelTC/LightX2V
cd LightX2V
python setup_vae.py install

๐Ÿค— Download Checkpoints

Model Cards

ModelName Download
SoulX-LiveAct ๐Ÿค— Huggingface
chinese-wav2vec2-base ๐Ÿค— Huggingface

๐Ÿ”‘ Inference

Usage of LiveAct

1. Run real-time streaming inference on two H100/H200 GPUs

USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0,1 \
torchrun --nproc_per_node=2 --master_port=$(shuf -n 1 -i 10000-65535)  \
    generate.py \
    --size 416*720 \
    --ckpt_dir MODEL_PATH \
    --wav2vec_dir chinese-wav2vec2-base \
    --fps 20 \
    --dura_print \
    --input_json examples/example.json \
    --steam_audio

2. Run with the best performance settings

USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0,1 \
torchrun --nproc_per_node=2 --master_port=$(shuf -n 1 -i 10000-65535)  \
    generate.py \
    --size 480*832 \
    --ckpt_dir MODEL_PATH \
    --wav2vec_dir chinese-wav2vec2-base \
    --fps 24 \
    --input_json examples/example.json

3. Run with action or emotion editing

USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0,1 \
torchrun --nproc_per_node=2 --master_port=$(shuf -n 1 -i 10000-65535)  \
    generate.py \
    --size 512*512 \
    --ckpt_dir MODEL_PATH \
    --wav2vec_dir chinese-wav2vec2-base \
    --fps 24 \
    --input_json examples/example_edit.json

4. Run on RTX 4090/RTX 5090 GPUs

Note: FP8 KV cache may slightly affect generation quality.

USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0 \
python generate.py \
    --size 416*720 \
    --ckpt_dir MODEL_PATH \
    --wav2vec_dir chinese-wav2vec2-base \
    --fps 24 \
    --input_json examples/example.json \
    --fp8_kv_cache \
    --block_offload \
    --t5_cpu

5. Run with single GPU for Eval

USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0 \
python generate.py \
    --size 480*832 \
    --ckpt_dir MODEL_PATH \
    --wav2vec_dir chinese-wav2vec2-base \
    --fps 24 \
    --input_json examples/example.json \
    --audio_cfg 1.7 \
    --t5_cpu

Command Line Arguments

Argument Type Required Default Description
--size str Yes - The width and height of the generated video.
--t5_cpu bool No false Whether to place T5 model on CPU.
--offload_cache bool No - Whether to place kv cache on CPU.
--fps int Yes - The target fps of the generated video.
--audio_cfg float No 1.0 Classifier free guidance scale for audio control.
--dura_print bool No no Whether print duration for every block.
--input_json str Yes _ The condition json file path to generate the video.
--seed int No 42 The seed to use for generating the image or video.
--steam_audio bool No false Whether inference with steaming audio.
--mean_memory bool No false Whether to use the mean memory strategy during inference for further performance improvement.
--fp8_kv_cache bool No false Whether to store kv cache in FP8 and dequantize to BF16 on use. FP8 KV cache may slightly affect generation quality.
--block_offload bool No false Whether to offload WanModel blocks to CPU between block forwards.

๐Ÿ’ป GUI demo

Run SoulX-LiveAct inference on the GUI demo and evaluate real-time performance.

Note: The first few blocks during the initial run require warm-up. Normal performance will be observed from the second run onward.

1. Run real-time streaming inference on two H100/H200 GPUs

USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0,1 \
torchrun --nproc_per_node=2 --master_port=$(shuf -n 1 -i 10000-65535) \
  demo.py \
  --ckpt_dir MODEL_PATH \
  --wav2vec_dir chinese-wav2vec2-base \
  --size 416*720 \
  --video_save_path ./generated_videos

2. Run on RTX 4090/RTX 5090 GPUs

USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0 \
torchrun --nproc_per_node=1 --master_port=$(shuf -n 1 -i 10000-65535) \
  demo.py \
  --ckpt_dir MODEL_PATH \
  --wav2vec_dir chinese-wav2vec2-base \
  --size 416*720 \
  --fp8_kv_cache \
  --block_offload \
  --t5_cpu \
  --video_save_path ./generated_videos

๐Ÿ“š Citation

@misc{zhen2026soulxliveacthourscalerealtimehuman,
      title={SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory}, 
      author={Dingcheng Zhen and Xu Zheng and Ruixin Zhang and Zhiqi Jiang and Yichao Yan and Ming Tao and Shunshun Yin},
      year={2026},
      eprint={2603.11746},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.11746}, 
}

๐Ÿ“ฎ Contact Us

If you are interested in leaving a message to our work, feel free to email dingchengzhen@soulapp.cn.

Youโ€™re welcome to join our WeChat group or Soul group for technical discussions.

WeChat Group QR Code WeChat QR Code

Downloads last month
191
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Soul-AILab/LiveAct

Finetuned
(19)
this model

Paper for Soul-AILab/LiveAct