speechbrain
/

tts-mstacotron2-libritts

multi-speaker-tts

Model card Files Files and versions

pradnya-hf-dev commited on Oct 16, 2023

Commit

e425eb9

·

1 Parent(s): e0781bc

Update README.md

Files changed (1) hide show

README.md +27 -0

README.md CHANGED Viewed

@@ -34,6 +34,8 @@ Please notice that we encourage you to read our tutorials and learn more about
 ### Perform Text-to-Speech (TTS)
 ```
 import torchaudio
 from speechbrain.pretrained import MSTacotron2
@@ -57,6 +59,31 @@ waveforms = hifi_gan.decode_batch(mel_outputs)
 torchaudio.save("synthesized_sample.wav", waveforms.squeeze(1).cpu(), 22050)
 ```
 If you want to generate multiple sentences in one-shot, you can do it this way:
 Note: The model internally reorders the input texts in the decreasing order of their lengths.

 ### Perform Text-to-Speech (TTS)
+The following is an example of converting text-to-speech with the speaker voice characteristics extracted from reference speech.
 ```
 import torchaudio
 from speechbrain.pretrained import MSTacotron2
 torchaudio.save("synthesized_sample.wav", waveforms.squeeze(1).cpu(), 22050)
 ```
+If you want to generate a random voice, you can use the following:
+```
+import torchaudio
+from speechbrain.pretrained import MSTacotron2
+from speechbrain.pretrained import HIFIGAN
+# Intialize TTS (mstacotron2) and Vocoder (HiFIGAN)
+ms_tacotron2 = MSTacotron2.from_hparams(source="speechbrain/tts-mstacotron2-libritts", savedir="tmpdir_tts")
+hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-libritts-22050Hz", savedir="tmpdir_vocoder")
+# Required input
+INPUT_TEXT = "Mary had a little lamb"
+# Running the Zero-Shot Multi-Speaker Tacotron2 model to generate mel-spectrogram
+mel_outputs, mel_lengths, alignments = ms_tacotron2.generate_random_voice(INPUT_TEXT)
+# Running Vocoder (spectrogram-to-waveform)
+waveforms = hifi_gan.decode_batch(mel_outputs)
+# Save the waverform
+torchaudio.save("synthesized_sample.wav", waveforms.squeeze(1).cpu(), 22050)
+```
 If you want to generate multiple sentences in one-shot, you can do it this way:
 Note: The model internally reorders the input texts in the decreasing order of their lengths.