lucasnewman
/

f5-tts-mlx

Model card Files Files and versions

lucasnewman commited on Oct 28, 2024

Commit

663e962

·

verified ·

1 Parent(s): 866facc

Update README.md

Files changed (1) hide show

README.md +49 -2

README.md CHANGED Viewed

@@ -4,6 +4,53 @@ tags:
   - mlx
 ---
-This model has been converted from Pytorch to .safetensors for MLX.
-See [F5-TTS](https://huggingface.co/SWivid/F5-TTS) for the original checkpoint.

   - mlx
 ---
+# F5 TTS — MLX
+[F5 TTS](https://arxiv.org/abs/2410.06885) for the [MLX](https://github.com/ml-explore/mlx) framework.
+This model is reshaped for MLX from the original weights and is designed for use with [f5-tts-mlx](https://github.com/lucasnewman/f5-tts-mlx)
+F5 TTS is a non-autoregressive, zero-shot text-to-speech system using a flow-matching mel spectrogram generator with a diffusion transformer (DiT).
+You can listen to a [sample here](https://s3.amazonaws.com/lucasnewman.datasets/f5tts/sample.wav) that was generated in ~11 seconds on an M3 Max MacBook Pro.
+See [F5-TTS](https://huggingface.co/SWivid/F5-TTS) for the original checkpoint.
+## Installation
+```bash
+pip install f5-tts-mlx
+```
+## Usage
+```bash
+python -m f5_tts_mlx.generate --text "The quick brown fox jumped over the lazy dog."
+```
+If you want to use your own reference audio sample, make sure it's a mono, 24kHz wav file of around 5-10 seconds:
+```bash
+python -m f5_tts_mlx.generate \
+--text "The quick brown fox jumped over the lazy dog."
+--ref-audio /path/to/audio.wav
+--ref-text "This is the caption for the reference audio."
+```
+You can convert an audio file to the correct format with ffmpeg like this:
+```bash
+ffmpeg -i /path/to/audio.wav -ac 1 -ar 24000 -sample_fmt s16 -t 10 /path/to/output_audio.wav
+```
+See [here](https://github.com/lucasnewman/f5-tts-mlx/tree/main/f5_tts_mlx) for more options to customize generation.
+—
+You can load a pretrained model from Python like this:
+```python
+from f5_tts_mlx.generate import generate
+audio = generate(text = "Hello world.", ...)
+```