NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Paper • 2403.03100 • Published • 37
FAcodec trained on 50k hours speech data, with more timbre diversity and better at reconstructing speakers from podcasts, videos, games or animations.
See main repository for example usages.