Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
hexgrad 
posted an update Jan 6
Post
21146
📣 Looking for labeled, high-quality synthetic audio/TTS data 📣 Have you been or are you currently calling API endpoints from OpenAI, ElevenLabs, etc? Do you have labeled audio data sitting around gathering dust? Let's talk! Join https://discord.gg/QuGxSWBfQy or comment down below.

If your data exceeds quantity & quality thresholds and is approved into the next hexgrad/Kokoro-82M training mix, and you permissively DM me the data under an effective Apache license, then I will DM back the corresponding voicepacks for YOUR data if/when the next Apache-licensed Kokoro base model drops.

What does this mean? If you've been calling closed-source TTS or audio API endpoints to:
- Build voice agents
- Make long-form audio, like audiobooks or podcasts
- Handle customer support, etc
Then YOU can contribute to the training mix and get useful artifacts in return. ❤️

More details at hexgrad/Kokoro-82M#21

TLDR: 🚨 Trade Offer 🚨
I receive: Synthetic Audio w/ Text Labels
You receive: Trained Voicepacks for an 82M Apache TTS model
Join https://discord.gg/QuGxSWBfQy to discuss

·

In what kind of format do you want this?

Hi, i test it today. Nice work. Will be ther german to in future?

·

It's simple: what you put in is what you get out. 😄 German support in the future depends mostly on how much German data (synthetic audio + text labels) is contributed.

tell me about quantum machanic

If you are looking for Arabic data, There are Common Voice data , SADA, MASC , MGB-2 , MGB-3 and MGB-5

This comment has been hidden
This comment has been hidden
This comment has been hidden

Hi, nice work! do you think it's possible to replace the tts part of the current end-to-end model(https://huggingface.co/openbmb/MiniCPM-o-2_6) with kokoro, which I've heard is the perfect speed and size for end-side devices?

would it be possible to use this dataset to train german : https://huggingface.co/datasets/amphion/Emilia-Dataset/tree/main/DE ?

This comment has been hidden

why don't we try to crowdsource actual human voices? What would be the "conditions" (i.e. high-quality, wav encoding, clear pronunciation, no-noise environment, etc.)? I mean, 100 hours isn't that much, especially for a small and free model (basically a gift to mankind)?

@hexgrad How many hours of data is needed for a new language and do you train separate model for each language or single model for all ?

how to train this model?

Support for the German language would be great, because it would open up new possibilities for school, university, and further learning processes. The solutions I’m aware of are either closed-source behind paywalls or simply not good (enough). What is the current planning status of this project? As far as I can see, having a German voice could make a big difference...