Update README.md
Browse files
README.md
CHANGED
@@ -13,7 +13,7 @@ pipeline_tag: automatic-speech-recognition
|
|
13 |
|
14 |
|
15 |
A 17.31M parameter multilingual linear projector trained for automatic speech recognition (ASR) using the SLAM-ASR speechLLM framework.
|
16 |
-
Within this framework, only the linear projector was trained
|
17 |
and frozen LLM ([EuroLLM-1.7B](https://huggingface.co/utter-project/EuroLLM-1.7B)).
|
18 |
|
19 |
- **Developed by:** SpeechTek Unit at Fondazione Bruno Kessler
|
@@ -28,7 +28,7 @@ This model is trained for Automatic Speech Recognition (ASR).
|
|
28 |
|
29 |
## How to Get Started with the Model
|
30 |
|
31 |
-
This linear projector can be used using the shell scripts provided in the [SLAM-ASR](https://github.com/X-LANCE/SLAM-LLM/tree/main/examples/asr_librispeech) codebase. Kindly refer to the instructions there
|
32 |
|
33 |
Whisper-large-v3-turbo and EuroLLM 1.7B must be downloaded before using this linear projector.
|
34 |
|
@@ -41,11 +41,12 @@ Specifically, the training set consisted of 92.5 hours of Common Voice data + 7.
|
|
41 |
|
42 |
### Training Procedure
|
43 |
|
44 |
-
The linear projector was trained using the code-based provided by the official [SLAM-ASR Github repository](https://github.com/X-LANCE/SLAM-LLM/tree/main/examples/asr_librispeech) with `torchrun`.
|
45 |
-
Only the linear projector was trained.
|
|
|
46 |
and LLM ([EuroLLM-1.7B](https://huggingface.co/utter-project/EuroLLM-1.7B)) were kept frozen.
|
47 |
-
|
48 |
-
Training was conducted with one NVIDIA Ada Lovelace L40S GPU.
|
49 |
|
50 |
|
51 |
#### Training Hyperparameters
|
@@ -79,6 +80,9 @@ Training was conducted with one NVIDIA Ada Lovelace L40S GPU.
|
|
79 |
|
80 |
## Evaluation
|
81 |
|
|
|
|
|
|
|
82 |
### Results
|
83 |
|
84 |
[More Information Needed]
|
|
|
13 |
|
14 |
|
15 |
A 17.31M parameter multilingual linear projector trained for automatic speech recognition (ASR) using the SLAM-ASR speechLLM framework.
|
16 |
+
Within this framework, only the linear projector was trained alongside a frozen speech encoder ([Whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo))
|
17 |
and frozen LLM ([EuroLLM-1.7B](https://huggingface.co/utter-project/EuroLLM-1.7B)).
|
18 |
|
19 |
- **Developed by:** SpeechTek Unit at Fondazione Bruno Kessler
|
|
|
28 |
|
29 |
## How to Get Started with the Model
|
30 |
|
31 |
+
This linear projector can be used using the shell scripts provided in the [SLAM-ASR](https://github.com/X-LANCE/SLAM-LLM/tree/main/examples/asr_librispeech) codebase. Kindly refer to the instructions there for further details.
|
32 |
|
33 |
Whisper-large-v3-turbo and EuroLLM 1.7B must be downloaded before using this linear projector.
|
34 |
|
|
|
41 |
|
42 |
### Training Procedure
|
43 |
|
44 |
+
* The linear projector was trained using the code-based provided by the official [SLAM-ASR Github repository](https://github.com/X-LANCE/SLAM-LLM/tree/main/examples/asr_librispeech) with `torchrun`.
|
45 |
+
* Only the linear projector was trained.
|
46 |
+
* The whisper-large-v3-turbo speech encoder (Whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo))
|
47 |
and LLM ([EuroLLM-1.7B](https://huggingface.co/utter-project/EuroLLM-1.7B)) were kept frozen.
|
48 |
+
* No prompt was used during training and inference
|
49 |
+
* Training was conducted with one NVIDIA Ada Lovelace L40S GPU.
|
50 |
|
51 |
|
52 |
#### Training Hyperparameters
|
|
|
80 |
|
81 |
## Evaluation
|
82 |
|
83 |
+
The model was evaluated using the Word Error Rate (WER) metric from the `evaluate` library.
|
84 |
+
Prior to computing the WER, preprocessing of ground-truth and predicted transcripts was carried out using the `Whisper EnglishTextNormalizer` for English and `BasicTextNormalizer` for all other languages.
|
85 |
+
|
86 |
### Results
|
87 |
|
88 |
[More Information Needed]
|