HiTZ
/

BBS-S2TC_conformer_transducer_large

@@ -32,7 +32,7 @@ model-index:
     metrics:
     - name: Test WER
       type: wer
-      value: 99.9
   - task:
       type: Automatic Speech Recognition
       name: speech-recognition
@@ -46,7 +46,7 @@ model-index:
     metrics:
     - name: Test WER
       type: wer
-      value: 99.9
   - task:
       type: Automatic Speech Recognition
       name: speech-recognition
@@ -60,7 +60,7 @@ model-index:
     metrics:
     - name: Test WER
       type: wer
-      value: 99.9
   - task:
       type: Automatic Speech Recognition
       name: speech-recognition
@@ -74,7 +74,7 @@ model-index:
     metrics:
     - name: Test WER
       type: wer
-      value: 99.9
   - task:
       type: Automatic Speech Recognition
       name: speech-recognition
@@ -88,7 +88,7 @@ model-index:
     metrics:
     - name: Test WER
       type: wer
-      value: 99.9
   - task:
       type: Automatic Speech Recognition
       name: speech-recognition
@@ -102,7 +102,7 @@ model-index:
     metrics:
     - name: Test WER
       type: wer
-      value: 99.9
   - task:
       type: Automatic Speech Recognition
       name: speech-recognition
@@ -116,7 +116,7 @@ model-index:
     metrics:
     - name: Test WER
       type: wer
-      value: 99.9
 ---
 # HiTZ/Aholab's Bilingual Basque Spanish Speech-to-Text model Conformer-Transducer for IBERSPEECH 2024's BBS-S2TC
@@ -133,7 +133,7 @@ img {
 | [![Language](https://img.shields.io/badge/Language-eu-lightgrey#model-badge)](#datasets)
 | [![Language](https://img.shields.io/badge/Language-es-lightgrey#model-badge)](#datasets)
-This model was specifically designed for a submission in the Bilingual Basque Spanish Speech to Text Challenge from the IBERSPEECH 2024 Albayzin evalutaions chalenges section. The trained was fitted for a good performance on the challenge's evaluation splits.
 This model transcribes speech in lowercase Spanish alphabet including spaces, and was trained on a composite dataset comprising of 1462 hours of Spanish and Basque speech. The model was fine-tuned from a pre-trained Basque [stt_eu_conformer_transducer_large](https://huggingface.co/HiTZ/stt_eu_conformer_transducer_large) model using the [Nvidia NeMo](https://github.com/NVIDIA/NeMo) toolkit.  It is an autoregressive "large" variant of Conformer, with around 119 million parameters.
 See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-transducer) for complete architecture details.
@@ -194,7 +194,7 @@ The tokenizer for these model was built using the text transcripts of the train
 Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding in the following table.
 | Tokenizer             | Vocabulary Size | MCV 18.0 Test ES | MCV 18.1 Test EU | Basque Parliament Test ES | Basque Parliament Test EU | Basque Parliament Test BI | MLS Test ES | VoxPopuli ES | Train Dataset              |
 |-----------------------|-----------------|------------------|------------------|---------------------------|---------------------------|---------------------------|-------------|--------------|----------------------------|
-| SentencePiece Unigram | 128             | 99.9             | 99.9             | 99.9                      | 99.9                      | 99.9                      | 99.9        | 99.9         | Basque Palriament (1462 h) |
 ## Limitations
 Since this model was trained on almost publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.

     metrics:
     - name: Test WER
       type: wer
+      value: 7.22
   - task:
       type: Automatic Speech Recognition
       name: speech-recognition
     metrics:
     - name: Test WER
       type: wer
+      value: 14.52
   - task:
       type: Automatic Speech Recognition
       name: speech-recognition
     metrics:
     - name: Test WER
       type: wer
+      value: 3.8
   - task:
       type: Automatic Speech Recognition
       name: speech-recognition
     metrics:
     - name: Test WER
       type: wer
+      value: 2.18
   - task:
       type: Automatic Speech Recognition
       name: speech-recognition
     metrics:
     - name: Test WER
       type: wer
+      value: 4.51
   - task:
       type: Automatic Speech Recognition
       name: speech-recognition
     metrics:
     - name: Test WER
       type: wer
+      value: 7.84
   - task:
       type: Automatic Speech Recognition
       name: speech-recognition
     metrics:
     - name: Test WER
       type: wer
+      value: 10.29
 ---
 # HiTZ/Aholab's Bilingual Basque Spanish Speech-to-Text model Conformer-Transducer for IBERSPEECH 2024's BBS-S2TC
 | [![Language](https://img.shields.io/badge/Language-eu-lightgrey#model-badge)](#datasets)
 | [![Language](https://img.shields.io/badge/Language-es-lightgrey#model-badge)](#datasets)
+This model was specifically designed for a submission in the Bilingual Basque Spanish Speech to Text Challenge from the IBERSPEECH 2024 Albayzin evalutaions chalenges section. The train was fitted for a good performance on the challenge's evaluation splits, therefore, the performance in other splits is worse.
 This model transcribes speech in lowercase Spanish alphabet including spaces, and was trained on a composite dataset comprising of 1462 hours of Spanish and Basque speech. The model was fine-tuned from a pre-trained Basque [stt_eu_conformer_transducer_large](https://huggingface.co/HiTZ/stt_eu_conformer_transducer_large) model using the [Nvidia NeMo](https://github.com/NVIDIA/NeMo) toolkit.  It is an autoregressive "large" variant of Conformer, with around 119 million parameters.
 See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-transducer) for complete architecture details.
 Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding in the following table.
 | Tokenizer             | Vocabulary Size | MCV 18.0 Test ES | MCV 18.1 Test EU | Basque Parliament Test ES | Basque Parliament Test EU | Basque Parliament Test BI | MLS Test ES | VoxPopuli ES | Train Dataset              |
 |-----------------------|-----------------|------------------|------------------|---------------------------|---------------------------|---------------------------|-------------|--------------|----------------------------|
+| SentencePiece Unigram | 128             | 14.52            | 7.22             | 2.18                      | 3.8                       | 4.51                      | 7.84        | 10.29        | Basque Palriament (1462 h) |
 ## Limitations
 Since this model was trained on almost publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.