atlithor
/

RepeaTTS-level-1

text-generation

Model card Files Files and versions

RepeaTTS-level-1 / README.md

atlithor's picture

Update README.md

9279942 verified 6 months ago

|

history blame contribute delete

1.76 kB

	---
	library_name: transformers
	license: cc
	datasets:
	- atlithor/talromur3_without_emotions
	language:
	- is
	base_model:
	- parler-tts/parler-tts-mini-multilingual-v1.1
	pipeline_tag: text-to-speech
	---

	# Model Card for RepeaTTS-level-1
	See [Emotive Icelandic](https://huggingface.co/atlithor/EmotiveIcelandic) for more information about this model and the data that it is trained on.
	The RepeaTTS series is trained on the same data as Emotive Icelandic, but without emotive content disclosure.

	This model, level-1, corresponds to a model without any further refinement fine-tuning.

	## Usage

	Use the code below to get started with the model.

	```py
	import torch
	from parler_tts import ParlerTTSForConditionalGeneration
	from transformers import AutoTokenizer
	import soundfile as sf

	device = "cuda:0" if torch.cuda.is_available() else "cpu"

	model = ParlerTTSForConditionalGeneration.from_pretrained("atlithor/RepeaTTS-level-1").to(device)
	tokenizer = AutoTokenizer.from_pretrained("atlithor/EmotiveIcelandic")
	description_tokenizer = AutoTokenizer.from_pretrained(model.config.text_encoder._name_or_path)

	prompt = "Þetta er frábær hugmynd!" # E: this is a great idea!
	description = "The recording is of very high quality, with Ingrid's voice sounding clear and very close up."

	input_ids = description_tokenizer(description, return_tensors="pt").input_ids.to(device)
	prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

	generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
	audio_arr = generation.cpu().numpy().squeeze()
	sf.write("ingrid.wav", audio_arr, model.config.sampling_rate)
	```



	## Citation
	_coming later_

	BibTeX:

	[More Information Needed]

	APA:

	[More Information Needed]