LayaCodec

LayaCodec: Rapid, High-Fidelity Audio Compression: Reaching the Pareto Frontier in Neural Audio Codecs

This is a neural audio codec/tokenizer that encodes 16khz at a rate from 12.5 t/s(0.16 kpbs) to 50 t/s(0.65 kpbs) using a single 8192 size codebook and decodes it into 44.1khz audio. This allows for much faster and scalable TTS models compared to othern modern codecs for several reasons.

  1. Much lower token rates than other single pass codecs such as Xcodec2(50 t/s), Snac(83 t/s), Dac(774 t/s), etc.
  2. Much smaller codebook size(8192) compared to Xcodec2(65536) for faster TTS model training speed.
  3. Over 40x faster then most diffusion based codecs allowing for much simpler and larger scale TTS models where codecs are not the bottleneck.
  4. Decodes audio into 44.1khz which is much higher quality then the common 24khz or 16khz sampling rate.

Repo: https://github.com/ysharma3501/LayaCodec

This is still W.I.P, it has only seen a few hundred hours of training data but surprisingly good quality. It will still need some more training.

Model is released with a permissive CC-BY-4.0 license and Code is released with Apache-2.0 license.

Thanks very much to the authors of FocalCodec and Anime-XCodec2.

Downloads last month
128
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train YatharthS/LayaCodec