PapaRazi/Ijazah_Palsu_V2 · 🇮🇩 Indonesian TTS Model (F5-TTS)
Ijazah_Palsu_V2 is a fine-tuned Indonesian speech synthesis model based on F5-TTS.
It was trained using a custom-curated dataset called PapaRazi/id-tts-v2
, focusing on natural and expressive Indonesian speech generation.
🧠 Model Details
- Base Framework: F5-TTS
- Training Time: ~3 days
- Dataset Size: ~70,000 samples (70 hours)
- Languages:
- Bahasa Indonesia (95%)
- English (5%) (limited English quality due to small dataset size)
- License: Non-commercial use only
- Author: [PapaRazi] (https://huggingface.co/PapaRazi) / (https://github.com/adigayung)
🛠 Training Configuration
{
"exp_name": "F5TTS_v1_Base",
"learning_rate": 1e-05,
"batch_size_per_gpu": 1700,
"batch_size_type": "frame",
"max_samples": 64,
"grad_accumulation_steps": 1,
"max_grad_norm": 1,
"epochs": 34,
"num_warmup_updates": 7000,
"save_per_updates": 15000,
"keep_last_n_checkpoints": 7,
"last_per_updates": 15000,
"finetune": true,
"file_checkpoint_train": "",
"tokenizer_type": "char",
"tokenizer_file": "",
"mixed_precision": "fp16",
"logger": "tensorboard",
"bnb_optimizer": false
}
📦 Dataset The dataset used for training is called PapaRazi/id-tts-v2, consisting of curated and cleaned audio-text pairs in Bahasa Indonesia. All preprocessing, splitting, and cleaning was done using a custom tool I developed: 🔧 whisper-tools
The default dataset splitter from F5-TTS produced inconsistent results (clips that were too short or way too long), so I built a custom pipeline to ensure clean, consistent samples.
🔊 Audio Samples
🗣 Natural Sentence
"Suatu hari nanti, suara ini mungkin tidak bisa dibedakan lagi dari suara manusia asli."
🎧 Listen on vocaroo
🔢 Number Pronunciation (simple format)
"Serius?! Tiket konsernya habis dalam waktu 3 menit?!"
🎧 Listen on vocaroo
💸 Number Hallucination (millions format – still imperfect)
"Masa cuma buat beli kursi kantor aja harus bayar Rp 2.500.000,-?! Gila sih itu!"
🎧 Listen on vocaroo ⚠️ Reading large numbers (like millions) is still inaccurate due to limited examples in the training dataset.
🤝 License & Usage This model is released for non-commercial use only. Feel free to explore, fine-tune, or give feedback!
Model tree for PapaRazi/Ijazah_Palsu_V2
Base model
SWivid/F5-TTS