seraphina commited on
Commit
81daced
·
verified ·
1 Parent(s): 7924637

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -6
README.md CHANGED
@@ -13,7 +13,7 @@ pipeline_tag: automatic-speech-recognition
13
 
14
 
15
  A 17.31M parameter multilingual linear projector trained for automatic speech recognition (ASR) using the SLAM-ASR speechLLM framework.
16
- Within this framework, only the linear projector was trained alongisde a frozen speech encoder ([Whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo))
17
  and frozen LLM ([EuroLLM-1.7B](https://huggingface.co/utter-project/EuroLLM-1.7B)).
18
 
19
  - **Developed by:** SpeechTek Unit at Fondazione Bruno Kessler
@@ -28,7 +28,7 @@ This model is trained for Automatic Speech Recognition (ASR).
28
 
29
  ## How to Get Started with the Model
30
 
31
- This linear projector can be used using the shell scripts provided in the [SLAM-ASR](https://github.com/X-LANCE/SLAM-LLM/tree/main/examples/asr_librispeech) codebase. Kindly refer to the instructions there with regards to data preparation and decoding.
32
 
33
  Whisper-large-v3-turbo and EuroLLM 1.7B must be downloaded before using this linear projector.
34
 
@@ -41,11 +41,12 @@ Specifically, the training set consisted of 92.5 hours of Common Voice data + 7.
41
 
42
  ### Training Procedure
43
 
44
- The linear projector was trained using the code-based provided by the official [SLAM-ASR Github repository](https://github.com/X-LANCE/SLAM-LLM/tree/main/examples/asr_librispeech) with `torchrun`.
45
- Only the linear projector was trained. The whisper-large-v3-turbo speech encoder (Whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo))
 
46
  and LLM ([EuroLLM-1.7B](https://huggingface.co/utter-project/EuroLLM-1.7B)) were kept frozen.
47
-
48
- Training was conducted with one NVIDIA Ada Lovelace L40S GPU.
49
 
50
 
51
  #### Training Hyperparameters
@@ -79,6 +80,9 @@ Training was conducted with one NVIDIA Ada Lovelace L40S GPU.
79
 
80
  ## Evaluation
81
 
 
 
 
82
  ### Results
83
 
84
  [More Information Needed]
 
13
 
14
 
15
  A 17.31M parameter multilingual linear projector trained for automatic speech recognition (ASR) using the SLAM-ASR speechLLM framework.
16
+ Within this framework, only the linear projector was trained alongside a frozen speech encoder ([Whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo))
17
  and frozen LLM ([EuroLLM-1.7B](https://huggingface.co/utter-project/EuroLLM-1.7B)).
18
 
19
  - **Developed by:** SpeechTek Unit at Fondazione Bruno Kessler
 
28
 
29
  ## How to Get Started with the Model
30
 
31
+ This linear projector can be used using the shell scripts provided in the [SLAM-ASR](https://github.com/X-LANCE/SLAM-LLM/tree/main/examples/asr_librispeech) codebase. Kindly refer to the instructions there for further details.
32
 
33
  Whisper-large-v3-turbo and EuroLLM 1.7B must be downloaded before using this linear projector.
34
 
 
41
 
42
  ### Training Procedure
43
 
44
+ * The linear projector was trained using the code-based provided by the official [SLAM-ASR Github repository](https://github.com/X-LANCE/SLAM-LLM/tree/main/examples/asr_librispeech) with `torchrun`.
45
+ * Only the linear projector was trained.
46
+ * The whisper-large-v3-turbo speech encoder (Whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo))
47
  and LLM ([EuroLLM-1.7B](https://huggingface.co/utter-project/EuroLLM-1.7B)) were kept frozen.
48
+ * No prompt was used during training and inference
49
+ * Training was conducted with one NVIDIA Ada Lovelace L40S GPU.
50
 
51
 
52
  #### Training Hyperparameters
 
80
 
81
  ## Evaluation
82
 
83
+ The model was evaluated using the Word Error Rate (WER) metric from the `evaluate` library.
84
+ Prior to computing the WER, preprocessing of ground-truth and predicted transcripts was carried out using the `Whisper EnglishTextNormalizer` for English and `BasicTextNormalizer` for all other languages.
85
+
86
  ### Results
87
 
88
  [More Information Needed]