nvidia
/

stt_es_conformer_ctc_large

@@ -167,15 +167,15 @@ All the models in this collection are trained on a composite dataset (NeMo ASRSE
 The list of the available models in this collection is shown in the following table. Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding.
-| Version | Tokenizer             | Vocabulary Size | MCV 7.0 Dev | MCV 7.0 Test | MLS Dev | MLS Test | Voxpopuli Dev | Voxpopuli Test | Train Dataset   |
-|---------|-----------------------|-----------------|-------------|--------------|---------|----------|---------------|----------------|-----------------|
-| 1.8.0   | SentencePiece Unigram | 1024            | 6.3         | 6.9          | 4.3     | 4.2      | 6.1           | 7.5            | NeMo ASRSET 2.0 |
 While deploying with [NVIDIA Riva](https://developer.nvidia.com/riva), you can combine this model with external language models to further improve WER. The WER(%) of the latest model with different language modeling techniques are reported in the following table.
-| Language Modeling | Training Dataset                                                             | MCV 7.0 Dev | MCV 7.0 Test | MLS Dev | MLS Test | Voxpopuli Dev | Voxpopuli Test | Comment                                                |
-|-------------------|------------------------------------------------------------------------------|-------------|--------------|---------|----------|---------------|----------------|--------------------------------------------------------|
-| N-gram LM         | Spanish News Crawl corpus (50M sentences) + NeMo ASRSET training transcripts | 5.0         | 5.5          | 3.6     | 3.6      | 5.5           | 6.7            | N=4, beam_width=128, n_gram_alpha=0.8, n_gram_beta=1.5 |
 ## Limitations
 Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
 ## Deployment with NVIDIA Riva

 The list of the available models in this collection is shown in the following table. Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding.
+| Version | Tokenizer             | Vocabulary Size | MCV 7.0 Dev | MCV 7.0 Test | MLS Dev | MLS Test | Voxpopuli Dev | Voxpopuli Test | Fisher Dev | Fisher Test| Train Dataset   |
+|---------|-----------------------|-----------------|-------------|--------------|---------|----------|---------------|----------------|------------|-------------|-----------------|
+| 1.8.0   | SentencePiece Unigram | 1024            | 6.3         | 6.9          | 4.3     | 4.2      | 6.1           | 7.5            | 18.3.      | 18.5        | NeMo ASRSET 2.0 |
 While deploying with [NVIDIA Riva](https://developer.nvidia.com/riva), you can combine this model with external language models to further improve WER. The WER(%) of the latest model with different language modeling techniques are reported in the following table.
+| Language Modeling | Training Dataset                                                             | MCV 7.0 Dev | MCV 7.0 Test | MLS Dev | MLS Test | Voxpopuli Dev | Voxpopuli Test | Fisher Dev | Fisher Test| Comment                                                |
+|-------------------|------------------------------------------------------------------------------|-------------|--------------|---------|----------|---------------|----------------|----------------|----------------|--------------------------------------------------------|
+| N-gram LM         | Spanish News Crawl corpus (50M sentences) + NeMo ASRSET training transcripts | 5.0         | 5.5          | 3.6     | 3.6      | 5.5           | 6.7 | 17.4 | 17.5            | N=4, beam_width=128, n_gram_alpha=0.8, n_gram_beta=1.5 |
 ## Limitations
 Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
 ## Deployment with NVIDIA Riva