Usage Guidelines

#2
by C4G-HKUST - opened

Dear Users,

To ensure smooth GPU task processing, please adhere to these audio length guidelines:

Fast Mode (120s GPU budget, suitable for any type of users):

  • Keep audio under 4 seconds. Audio inputs longer than 4 seconds will be automatically trimmed to 4 seconds.
  • In multi-person mode, it's the total sum of lengths for everyone (for concat mode) or the maximum length (for pad mode).
  • Fixed 8 denoising steps for quick generation.

Quality Mode (Dynamic GPU budget):

  • GPU duration is dynamically calculated as: 60s (preprocessing) + video_seconds × denoising_steps × 3 seconds.
  • For example, an 8-second audio with 25 denoising steps will take approximately 660 seconds (60 + 8 × 25 × 3 = 60 + 600 = 660s ≈ 11 minutes).
  • You can adjust the number of denoising steps in advanced options to accommodate longer audio durations or reduce GPU time.

Longer Videos:

C4G-HKUST pinned discussion
C4G-HKUST unpinned discussion
C4G-HKUST pinned discussion

Sign up or log in to comment