Update README.md
Browse files
README.md
CHANGED
|
@@ -32,7 +32,7 @@ model-index:
|
|
| 32 |
metrics:
|
| 33 |
- name: Test WER
|
| 34 |
type: wer
|
| 35 |
-
value:
|
| 36 |
- task:
|
| 37 |
type: Automatic Speech Recognition
|
| 38 |
name: speech-recognition
|
|
@@ -46,7 +46,7 @@ model-index:
|
|
| 46 |
metrics:
|
| 47 |
- name: Test WER
|
| 48 |
type: wer
|
| 49 |
-
value:
|
| 50 |
- task:
|
| 51 |
type: Automatic Speech Recognition
|
| 52 |
name: speech-recognition
|
|
@@ -60,7 +60,7 @@ model-index:
|
|
| 60 |
metrics:
|
| 61 |
- name: Test WER
|
| 62 |
type: wer
|
| 63 |
-
value:
|
| 64 |
- task:
|
| 65 |
type: Automatic Speech Recognition
|
| 66 |
name: speech-recognition
|
|
@@ -74,7 +74,7 @@ model-index:
|
|
| 74 |
metrics:
|
| 75 |
- name: Test WER
|
| 76 |
type: wer
|
| 77 |
-
value:
|
| 78 |
- task:
|
| 79 |
type: Automatic Speech Recognition
|
| 80 |
name: speech-recognition
|
|
@@ -88,7 +88,7 @@ model-index:
|
|
| 88 |
metrics:
|
| 89 |
- name: Test WER
|
| 90 |
type: wer
|
| 91 |
-
value:
|
| 92 |
- task:
|
| 93 |
type: Automatic Speech Recognition
|
| 94 |
name: speech-recognition
|
|
@@ -102,7 +102,7 @@ model-index:
|
|
| 102 |
metrics:
|
| 103 |
- name: Test WER
|
| 104 |
type: wer
|
| 105 |
-
value:
|
| 106 |
- task:
|
| 107 |
type: Automatic Speech Recognition
|
| 108 |
name: speech-recognition
|
|
@@ -116,7 +116,7 @@ model-index:
|
|
| 116 |
metrics:
|
| 117 |
- name: Test WER
|
| 118 |
type: wer
|
| 119 |
-
value:
|
| 120 |
---
|
| 121 |
|
| 122 |
# HiTZ/Aholab's Bilingual Basque Spanish Speech-to-Text model Conformer-Transducer for IBERSPEECH 2024's BBS-S2TC
|
|
@@ -133,7 +133,7 @@ img {
|
|
| 133 |
| [](#datasets)
|
| 134 |
| [](#datasets)
|
| 135 |
|
| 136 |
-
This model was specifically designed for a submission in the Bilingual Basque Spanish Speech to Text Challenge from the IBERSPEECH 2024 Albayzin evalutaions chalenges section. The
|
| 137 |
|
| 138 |
This model transcribes speech in lowercase Spanish alphabet including spaces, and was trained on a composite dataset comprising of 1462 hours of Spanish and Basque speech. The model was fine-tuned from a pre-trained Basque [stt_eu_conformer_transducer_large](https://huggingface.co/HiTZ/stt_eu_conformer_transducer_large) model using the [Nvidia NeMo](https://github.com/NVIDIA/NeMo) toolkit. It is an autoregressive "large" variant of Conformer, with around 119 million parameters.
|
| 139 |
See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-transducer) for complete architecture details.
|
|
@@ -194,7 +194,7 @@ The tokenizer for these model was built using the text transcripts of the train
|
|
| 194 |
Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding in the following table.
|
| 195 |
| Tokenizer | Vocabulary Size | MCV 18.0 Test ES | MCV 18.1 Test EU | Basque Parliament Test ES | Basque Parliament Test EU | Basque Parliament Test BI | MLS Test ES | VoxPopuli ES | Train Dataset |
|
| 196 |
|-----------------------|-----------------|------------------|------------------|---------------------------|---------------------------|---------------------------|-------------|--------------|----------------------------|
|
| 197 |
-
| SentencePiece Unigram | 128 |
|
| 198 |
|
| 199 |
## Limitations
|
| 200 |
Since this model was trained on almost publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
|
|
|
|
| 32 |
metrics:
|
| 33 |
- name: Test WER
|
| 34 |
type: wer
|
| 35 |
+
value: 7.22
|
| 36 |
- task:
|
| 37 |
type: Automatic Speech Recognition
|
| 38 |
name: speech-recognition
|
|
|
|
| 46 |
metrics:
|
| 47 |
- name: Test WER
|
| 48 |
type: wer
|
| 49 |
+
value: 14.52
|
| 50 |
- task:
|
| 51 |
type: Automatic Speech Recognition
|
| 52 |
name: speech-recognition
|
|
|
|
| 60 |
metrics:
|
| 61 |
- name: Test WER
|
| 62 |
type: wer
|
| 63 |
+
value: 3.8
|
| 64 |
- task:
|
| 65 |
type: Automatic Speech Recognition
|
| 66 |
name: speech-recognition
|
|
|
|
| 74 |
metrics:
|
| 75 |
- name: Test WER
|
| 76 |
type: wer
|
| 77 |
+
value: 2.18
|
| 78 |
- task:
|
| 79 |
type: Automatic Speech Recognition
|
| 80 |
name: speech-recognition
|
|
|
|
| 88 |
metrics:
|
| 89 |
- name: Test WER
|
| 90 |
type: wer
|
| 91 |
+
value: 4.51
|
| 92 |
- task:
|
| 93 |
type: Automatic Speech Recognition
|
| 94 |
name: speech-recognition
|
|
|
|
| 102 |
metrics:
|
| 103 |
- name: Test WER
|
| 104 |
type: wer
|
| 105 |
+
value: 7.84
|
| 106 |
- task:
|
| 107 |
type: Automatic Speech Recognition
|
| 108 |
name: speech-recognition
|
|
|
|
| 116 |
metrics:
|
| 117 |
- name: Test WER
|
| 118 |
type: wer
|
| 119 |
+
value: 10.29
|
| 120 |
---
|
| 121 |
|
| 122 |
# HiTZ/Aholab's Bilingual Basque Spanish Speech-to-Text model Conformer-Transducer for IBERSPEECH 2024's BBS-S2TC
|
|
|
|
| 133 |
| [](#datasets)
|
| 134 |
| [](#datasets)
|
| 135 |
|
| 136 |
+
This model was specifically designed for a submission in the Bilingual Basque Spanish Speech to Text Challenge from the IBERSPEECH 2024 Albayzin evalutaions chalenges section. The train was fitted for a good performance on the challenge's evaluation splits, therefore, the performance in other splits is worse.
|
| 137 |
|
| 138 |
This model transcribes speech in lowercase Spanish alphabet including spaces, and was trained on a composite dataset comprising of 1462 hours of Spanish and Basque speech. The model was fine-tuned from a pre-trained Basque [stt_eu_conformer_transducer_large](https://huggingface.co/HiTZ/stt_eu_conformer_transducer_large) model using the [Nvidia NeMo](https://github.com/NVIDIA/NeMo) toolkit. It is an autoregressive "large" variant of Conformer, with around 119 million parameters.
|
| 139 |
See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-transducer) for complete architecture details.
|
|
|
|
| 194 |
Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding in the following table.
|
| 195 |
| Tokenizer | Vocabulary Size | MCV 18.0 Test ES | MCV 18.1 Test EU | Basque Parliament Test ES | Basque Parliament Test EU | Basque Parliament Test BI | MLS Test ES | VoxPopuli ES | Train Dataset |
|
| 196 |
|-----------------------|-----------------|------------------|------------------|---------------------------|---------------------------|---------------------------|-------------|--------------|----------------------------|
|
| 197 |
+
| SentencePiece Unigram | 128 | 14.52 | 7.22 | 2.18 | 3.8 | 4.51 | 7.84 | 10.29 | Basque Palriament (1462 h) |
|
| 198 |
|
| 199 |
## Limitations
|
| 200 |
Since this model was trained on almost publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
|