asierhv commited on
Commit
61c5466
·
verified ·
1 Parent(s): 8a373cf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -32,7 +32,7 @@ model-index:
32
  metrics:
33
  - name: Test WER
34
  type: wer
35
- value: 99.9
36
  - task:
37
  type: Automatic Speech Recognition
38
  name: speech-recognition
@@ -46,7 +46,7 @@ model-index:
46
  metrics:
47
  - name: Test WER
48
  type: wer
49
- value: 99.9
50
  - task:
51
  type: Automatic Speech Recognition
52
  name: speech-recognition
@@ -60,7 +60,7 @@ model-index:
60
  metrics:
61
  - name: Test WER
62
  type: wer
63
- value: 99.9
64
  - task:
65
  type: Automatic Speech Recognition
66
  name: speech-recognition
@@ -74,7 +74,7 @@ model-index:
74
  metrics:
75
  - name: Test WER
76
  type: wer
77
- value: 99.9
78
  - task:
79
  type: Automatic Speech Recognition
80
  name: speech-recognition
@@ -88,7 +88,7 @@ model-index:
88
  metrics:
89
  - name: Test WER
90
  type: wer
91
- value: 99.9
92
  - task:
93
  type: Automatic Speech Recognition
94
  name: speech-recognition
@@ -102,7 +102,7 @@ model-index:
102
  metrics:
103
  - name: Test WER
104
  type: wer
105
- value: 99.9
106
  - task:
107
  type: Automatic Speech Recognition
108
  name: speech-recognition
@@ -116,7 +116,7 @@ model-index:
116
  metrics:
117
  - name: Test WER
118
  type: wer
119
- value: 99.9
120
  ---
121
 
122
  # HiTZ/Aholab's Bilingual Basque Spanish Speech-to-Text model Conformer-Transducer for IBERSPEECH 2024's BBS-S2TC
@@ -133,7 +133,7 @@ img {
133
  | [![Language](https://img.shields.io/badge/Language-eu-lightgrey#model-badge)](#datasets)
134
  | [![Language](https://img.shields.io/badge/Language-es-lightgrey#model-badge)](#datasets)
135
 
136
- This model was specifically designed for a submission in the Bilingual Basque Spanish Speech to Text Challenge from the IBERSPEECH 2024 Albayzin evalutaions chalenges section. The trained was fitted for a good performance on the challenge's evaluation splits.
137
 
138
  This model transcribes speech in lowercase Spanish alphabet including spaces, and was trained on a composite dataset comprising of 1462 hours of Spanish and Basque speech. The model was fine-tuned from a pre-trained Basque [stt_eu_conformer_transducer_large](https://huggingface.co/HiTZ/stt_eu_conformer_transducer_large) model using the [Nvidia NeMo](https://github.com/NVIDIA/NeMo) toolkit. It is an autoregressive "large" variant of Conformer, with around 119 million parameters.
139
  See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-transducer) for complete architecture details.
@@ -194,7 +194,7 @@ The tokenizer for these model was built using the text transcripts of the train
194
  Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding in the following table.
195
  | Tokenizer | Vocabulary Size | MCV 18.0 Test ES | MCV 18.1 Test EU | Basque Parliament Test ES | Basque Parliament Test EU | Basque Parliament Test BI | MLS Test ES | VoxPopuli ES | Train Dataset |
196
  |-----------------------|-----------------|------------------|------------------|---------------------------|---------------------------|---------------------------|-------------|--------------|----------------------------|
197
- | SentencePiece Unigram | 128 | 99.9 | 99.9 | 99.9 | 99.9 | 99.9 | 99.9 | 99.9 | Basque Palriament (1462 h) |
198
 
199
  ## Limitations
200
  Since this model was trained on almost publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
 
32
  metrics:
33
  - name: Test WER
34
  type: wer
35
+ value: 7.22
36
  - task:
37
  type: Automatic Speech Recognition
38
  name: speech-recognition
 
46
  metrics:
47
  - name: Test WER
48
  type: wer
49
+ value: 14.52
50
  - task:
51
  type: Automatic Speech Recognition
52
  name: speech-recognition
 
60
  metrics:
61
  - name: Test WER
62
  type: wer
63
+ value: 3.8
64
  - task:
65
  type: Automatic Speech Recognition
66
  name: speech-recognition
 
74
  metrics:
75
  - name: Test WER
76
  type: wer
77
+ value: 2.18
78
  - task:
79
  type: Automatic Speech Recognition
80
  name: speech-recognition
 
88
  metrics:
89
  - name: Test WER
90
  type: wer
91
+ value: 4.51
92
  - task:
93
  type: Automatic Speech Recognition
94
  name: speech-recognition
 
102
  metrics:
103
  - name: Test WER
104
  type: wer
105
+ value: 7.84
106
  - task:
107
  type: Automatic Speech Recognition
108
  name: speech-recognition
 
116
  metrics:
117
  - name: Test WER
118
  type: wer
119
+ value: 10.29
120
  ---
121
 
122
  # HiTZ/Aholab's Bilingual Basque Spanish Speech-to-Text model Conformer-Transducer for IBERSPEECH 2024's BBS-S2TC
 
133
  | [![Language](https://img.shields.io/badge/Language-eu-lightgrey#model-badge)](#datasets)
134
  | [![Language](https://img.shields.io/badge/Language-es-lightgrey#model-badge)](#datasets)
135
 
136
+ This model was specifically designed for a submission in the Bilingual Basque Spanish Speech to Text Challenge from the IBERSPEECH 2024 Albayzin evalutaions chalenges section. The train was fitted for a good performance on the challenge's evaluation splits, therefore, the performance in other splits is worse.
137
 
138
  This model transcribes speech in lowercase Spanish alphabet including spaces, and was trained on a composite dataset comprising of 1462 hours of Spanish and Basque speech. The model was fine-tuned from a pre-trained Basque [stt_eu_conformer_transducer_large](https://huggingface.co/HiTZ/stt_eu_conformer_transducer_large) model using the [Nvidia NeMo](https://github.com/NVIDIA/NeMo) toolkit. It is an autoregressive "large" variant of Conformer, with around 119 million parameters.
139
  See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-transducer) for complete architecture details.
 
194
  Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding in the following table.
195
  | Tokenizer | Vocabulary Size | MCV 18.0 Test ES | MCV 18.1 Test EU | Basque Parliament Test ES | Basque Parliament Test EU | Basque Parliament Test BI | MLS Test ES | VoxPopuli ES | Train Dataset |
196
  |-----------------------|-----------------|------------------|------------------|---------------------------|---------------------------|---------------------------|-------------|--------------|----------------------------|
197
+ | SentencePiece Unigram | 128 | 14.52 | 7.22 | 2.18 | 3.8 | 4.51 | 7.84 | 10.29 | Basque Palriament (1462 h) |
198
 
199
  ## Limitations
200
  Since this model was trained on almost publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.