KittenML/kitten-tts-nano-0.1 · Fine-tuning Kitten TTS Nano For another language

15 days ago

Hi and thanks for sharing this great model!
I'm interested in fine-tuning Kitten TTS for Persian (Farsi) audio tasks, I’d appreciate your guidance on a few key points:
Is fine-tuning for a new language like Persian supported or practical with this model?
Given that it's trained in English, I’d like to know how transferable the learned representations are to a different language, especially a low-resource one.
Roughly how much audio data would be needed for a meaningful fine-tuning on Persian?
I understand it depends on the task and setup, but a ballpark estimate would help a lot (e.g., hours of audio, number of samples, etc.).
Are there any recommended training settings or constraints (batch size, LR, augmentation, etc.) that you found important when fine-tuning this architecture?
Does the model architecture support freezing early layers, or is end-to-end fine-tuning preferable?
Finally, do you provide or suggest any starter scripts, notebooks, or best practices for fine-tuning?
I’d really appreciate any help or pointers. Thank you in advance for your work and time!

arshambz changed discussion title from Fine-tuning Kitten TTS Nano to Fine-tuning Kitten TTS Nano For another language 15 days ago

mraudible1983

9 days ago

I would also be interested in this, please advice. Have dataset

juliproo

8 days ago

Me too!

robinxaix

6 days ago

meto