Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.09956

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18, 2024 • 18
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18, 2024 • 9
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18, 2024 • 10
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19, 2024 • 13

HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

Paper • 2404.09990 • Published Apr 15, 2024 • 14
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 11
TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models

Paper • 2404.09204 • Published Apr 14, 2024 • 11
Taming Latent Diffusion Model for Neural Radiance Field Inpainting

Paper • 2404.09995 • Published Apr 15, 2024 • 7

Papers - Audio - Encoder - Variational Auto-Encoder (VAE)

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 11

Papers - Audio - Fine-tuning - Alpaca

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 11

Papers - University - University of Michigan

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 11
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space

Paper • 2406.19370 • Published Jun 27, 2024 • 1

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23, 2024 • 32
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

Paper • 2306.15687 • Published Jun 23, 2023 • 1
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5, 2024 • 37
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 11

Papers - Audio - Frechet Audio Distance (FAD) like FID

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 11

Papers - Audio - Clap

We use an ensemble filtering strategy based on two different CLAP models: 630k-audioset-best and 630k-best

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 11
Long-form music generation with latent diffusion

Paper • 2404.10301 • Published Apr 16, 2024 • 27
Music Consistency Models

Paper • 2404.13358 • Published Apr 20, 2024 • 14

Papers - Audio - Fine-tuning - DPO

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 11

Papers - University - National University of Singapore

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 11
Contrastive Chain-of-Thought Prompting

Paper • 2311.09277 • Published Nov 15, 2023 • 35
FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23, 2024 • 32
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

Paper • 2406.06911 • Published Jun 11, 2024 • 12

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18, 2024 • 18
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18, 2024 • 9
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18, 2024 • 10
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19, 2024 • 13

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23, 2024 • 32
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

Paper • 2306.15687 • Published Jun 23, 2023 • 1
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5, 2024 • 37
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 11

HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

Paper • 2404.09990 • Published Apr 15, 2024 • 14
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 11
TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models

Paper • 2404.09204 • Published Apr 14, 2024 • 11
Taming Latent Diffusion Model for Neural Radiance Field Inpainting

Paper • 2404.09995 • Published Apr 15, 2024 • 7

Papers - Audio - Frechet Audio Distance (FAD) like FID

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 11

Papers - Audio - Encoder - Variational Auto-Encoder (VAE)

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 11

Papers - Audio - Clap

We use an ensemble filtering strategy based on two different CLAP models: 630k-audioset-best and 630k-best

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 11
Long-form music generation with latent diffusion

Paper • 2404.10301 • Published Apr 16, 2024 • 27
Music Consistency Models

Paper • 2404.13358 • Published Apr 20, 2024 • 14

Papers - Audio - Fine-tuning - Alpaca

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 11

Papers - Audio - Fine-tuning - DPO

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 11

Papers - University - University of Michigan

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 11
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space

Paper • 2406.19370 • Published Jun 27, 2024 • 1

Papers - University - National University of Singapore

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 11
Contrastive Chain-of-Thought Prompting

Paper • 2311.09277 • Published Nov 15, 2023 • 35
FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23, 2024 • 32
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

Paper • 2406.06911 • Published Jun 11, 2024 • 12

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs