KVAE 2.0 is a family of video tokenizers with a time compression ratio of 4 and spacial compression ratio of 8 and 16