SpeechTek
/

mEUltilingual_speechllm_linear_projector_v1

Automatic Speech Recognition

Model card Files Files and versions

xet

Community

seraphina commited on Jun 11

Commit

81daced

verified ·

1 Parent(s): 7924637

Update README.md

Browse files

Files changed (1) hide show

README.md +10 -6

README.md CHANGED Viewed

@@ -13,7 +13,7 @@ pipeline_tag: automatic-speech-recognition
 A 17.31M parameter multilingual linear projector trained for automatic speech recognition (ASR) using the SLAM-ASR speechLLM framework.
-Within this framework, only the linear projector was trained alongisde a frozen speech encoder ([Whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo))
 and frozen LLM ([EuroLLM-1.7B](https://huggingface.co/utter-project/EuroLLM-1.7B)).
 - **Developed by:** SpeechTek Unit at Fondazione Bruno Kessler
@@ -28,7 +28,7 @@ This model is trained for Automatic Speech Recognition (ASR).
 ## How to Get Started with the Model
-This linear projector can be used using the shell scripts provided in the [SLAM-ASR](https://github.com/X-LANCE/SLAM-LLM/tree/main/examples/asr_librispeech) codebase. Kindly refer to the instructions there with regards to data preparation and decoding.
 Whisper-large-v3-turbo and EuroLLM 1.7B must be downloaded before using this linear projector.
@@ -41,11 +41,12 @@ Specifically, the training set consisted of 92.5 hours of Common Voice data + 7.
 ### Training Procedure
-The linear projector was trained using the code-based provided by the official [SLAM-ASR Github repository](https://github.com/X-LANCE/SLAM-LLM/tree/main/examples/asr_librispeech) with `torchrun`.
-Only the linear projector was trained. The whisper-large-v3-turbo speech encoder (Whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo))
 and LLM ([EuroLLM-1.7B](https://huggingface.co/utter-project/EuroLLM-1.7B)) were kept frozen.
-Training was conducted with one NVIDIA Ada Lovelace L40S GPU.
 #### Training Hyperparameters
@@ -79,6 +80,9 @@ Training was conducted with one NVIDIA Ada Lovelace L40S GPU.
 ## Evaluation
 ### Results
 [More Information Needed]

 A 17.31M parameter multilingual linear projector trained for automatic speech recognition (ASR) using the SLAM-ASR speechLLM framework.
+Within this framework, only the linear projector was trained alongside a frozen speech encoder ([Whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo))
 and frozen LLM ([EuroLLM-1.7B](https://huggingface.co/utter-project/EuroLLM-1.7B)).
 - **Developed by:** SpeechTek Unit at Fondazione Bruno Kessler
 ## How to Get Started with the Model
+This linear projector can be used using the shell scripts provided in the [SLAM-ASR](https://github.com/X-LANCE/SLAM-LLM/tree/main/examples/asr_librispeech) codebase. Kindly refer to the instructions there for further details.
 Whisper-large-v3-turbo and EuroLLM 1.7B must be downloaded before using this linear projector.
 ### Training Procedure
+* The linear projector was trained using the code-based provided by the official [SLAM-ASR Github repository](https://github.com/X-LANCE/SLAM-LLM/tree/main/examples/asr_librispeech) with `torchrun`.
+* Only the linear projector was trained.
+* The whisper-large-v3-turbo speech encoder (Whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo))
 and LLM ([EuroLLM-1.7B](https://huggingface.co/utter-project/EuroLLM-1.7B)) were kept frozen.
+* No prompt was used during training and inference
+* Training was conducted with one NVIDIA Ada Lovelace L40S GPU.
 #### Training Hyperparameters
 ## Evaluation
+The model was evaluated using the Word Error Rate (WER) metric from the `evaluate` library.
+Prior to computing the WER, preprocessing of ground-truth and predicted transcripts was carried out using the `Whisper EnglishTextNormalizer` for English and `BasicTextNormalizer` for all other languages.
 ### Results
 [More Information Needed]