|
|
--- |
|
|
library_name: transformers |
|
|
license: cc |
|
|
datasets: |
|
|
- atlithor/talromur3_without_emotions |
|
|
language: |
|
|
- is |
|
|
base_model: |
|
|
- parler-tts/parler-tts-mini-multilingual-v1.1 |
|
|
pipeline_tag: text-to-speech |
|
|
--- |
|
|
# Model Card for RepeaTTS-level-3 |
|
|
See [Emotive Icelandic](https://huggingface.co/atlithor/EmotiveIcelandic) for more information about this model and the data that it is trained on. |
|
|
The RepeaTTS series is trained on the same data as Emotive Icelandic, but without emotive content disclosure. |
|
|
|
|
|
This model, level-3, corresponds to a model with a double-refined subset of the original training corpus. The model can be, additionally, prompted |
|
|
with a "neutral" label, or an intensity label: |
|
|
- low intensity: voice is low expressive |
|
|
- high intensity: voice is very expressive |
|
|
|
|
|
## Usage |
|
|
|
|
|
Use the code below to get started with the model. |
|
|
|
|
|
```py |
|
|
import torch |
|
|
from parler_tts import ParlerTTSForConditionalGeneration |
|
|
from transformers import AutoTokenizer |
|
|
import soundfile as sf |
|
|
|
|
|
device = "cuda:0" if torch.cuda.is_available() else "cpu" |
|
|
model = ParlerTTSForConditionalGeneration.from_pretrained("atlithor/RepeaTTS-level-3").to(device) |
|
|
tokenizer = AutoTokenizer.from_pretrained("atlithor/EmotiveIcelandic") |
|
|
description_tokenizer = AutoTokenizer.from_pretrained(model.config.text_encoder._name_or_path) |
|
|
|
|
|
prompt = "Þetta er frábær hugmynd!" # E: this is a great idea! |
|
|
description = "The recording is of very high quality, with Ingrid's voice sounding clear and very close up. Ingrid speaks at very high intensity." |
|
|
|
|
|
input_ids = description_tokenizer(description, return_tensors="pt").input_ids.to(device) |
|
|
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device) |
|
|
|
|
|
generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids) |
|
|
audio_arr = generation.cpu().numpy().squeeze() |
|
|
sf.write("ingrid_intense.wav", audio_arr, model.config.sampling_rate) |
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
## Citation |
|
|
_coming later_ |
|
|
|
|
|
**BibTeX:** |
|
|
|
|
|
[More Information Needed] |
|
|
|
|
|
**APA:** |
|
|
|
|
|
[More Information Needed] |