nvidia
/

parakeet-tdt-0.6b-v3

Automatic Speech Recognition

hf-asr-leaderboard

Model card Files Files and versions Community

nithinraok commited on 13 days ago

Commit

3f27873

·

1 Parent(s): e6830b1

update

Signed-off-by: nithinraok <[email protected]>

Files changed (1) hide show

README.md +1 -3

README.md CHANGED Viewed

@@ -166,7 +166,7 @@ metrics:
 - wer
 ---
-# **Parakeet TDT 0.6B V3 (En)**
 <style>
 img {
@@ -178,8 +178,6 @@ img {
 | [![Model size](https://img.shields.io/badge/Params-0.6B-green#model-badge)](#model-architecture)
 | [![Language](https://img.shields.io/badge/Language-EU_Languages-blue#model-badge)](#datasets)
-## <span style="color:#76b900;">🦜 parakeet-tdt-0.6b-v3: Multilingual Speech-to-Text Model</span>
 ## <span style="color:#466f00;">Description:</span>
 `parakeet-tdt-0.6b-v3` is a 600-million-parameter multilingual automatic speech recognition (ASR) model designed for high-throughput speech-to-text transcription. It extends the [parakeet-tdt-0.6b-v2](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2) model by expanding language support from English to 25 European languages. The model automatically detects the language of the audio and transcribes it without requiring additional prompting. It is part of a series of models that leverage the [Granary](https://huggingface.co/datasets/nvidia/Granary) [1, 2] multilingual corpus as their primary training dataset.

 - wer
 ---
+# **<span style="color:#76b900;">🦜 parakeet-tdt-0.6b-v3: Multilingual Speech-to-Text Model</span>**
 <style>
 img {
 | [![Model size](https://img.shields.io/badge/Params-0.6B-green#model-badge)](#model-architecture)
 | [![Language](https://img.shields.io/badge/Language-EU_Languages-blue#model-badge)](#datasets)
 ## <span style="color:#466f00;">Description:</span>
 `parakeet-tdt-0.6b-v3` is a 600-million-parameter multilingual automatic speech recognition (ASR) model designed for high-throughput speech-to-text transcription. It extends the [parakeet-tdt-0.6b-v2](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2) model by expanding language support from English to 25 European languages. The model automatically detects the language of the audio and transcribes it without requiring additional prompting. It is part of a series of models that leverage the [Granary](https://huggingface.co/datasets/nvidia/Granary) [1, 2] multilingual corpus as their primary training dataset.