Add pipeline tag, library name and link to Github repo (#1)
Browse files- Add pipeline tag, library name and link to Github repo (405534c1def32542f94cb3e825bd9fc9c126f31e)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
|
@@ -1,19 +1,21 @@
|
|
| 1 |
---
|
| 2 |
-
license: cc-by-4.0
|
| 3 |
-
language:
|
| 4 |
-
- en
|
| 5 |
-
- it
|
| 6 |
datasets:
|
| 7 |
- FBK-MT/mosel
|
| 8 |
- facebook/covost2
|
| 9 |
- openslr/librispeech_asr
|
| 10 |
- facebook/voxpopuli
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
metrics:
|
| 12 |
- wer
|
| 13 |
tags:
|
| 14 |
- speech
|
| 15 |
- speech recognition
|
| 16 |
- ASR
|
|
|
|
|
|
|
| 17 |
---
|
| 18 |
|
| 19 |
# FAMA-small-asr
|
|
@@ -40,7 +42,6 @@ All the artifacts used for realizing FAMA models, including codebase, datasets,
|
|
| 40 |
themself are [released under OS-compliant licenses](#license), promoting a more
|
| 41 |
responsible creation of models in our community.
|
| 42 |
|
| 43 |
-
|
| 44 |
It is available in 2 sizes, with 2 variants for ASR only:
|
| 45 |
|
| 46 |
- [FAMA-small](https://huggingface.co/FBK-MT/fama-small) - 475 million parameters
|
|
@@ -49,7 +50,7 @@ It is available in 2 sizes, with 2 variants for ASR only:
|
|
| 49 |
- [FAMA-medium-asr](https://huggingface.co/FBK-MT/fama-medium-asr) - 878 million parameters
|
| 50 |
|
| 51 |
For more information about FAMA, please check our [blog post](https://huggingface.co/blog/FAMA/release) and the [arXiv](https://arxiv.org/abs/2505.22759) preprint.
|
| 52 |
-
|
| 53 |
|
| 54 |
## Usage
|
| 55 |
|
|
@@ -124,7 +125,6 @@ We also benchmark FAMA in terms of computational time and maximum batch size sup
|
|
| 124 |
- FAMA achieves up to 4.2 WER improvement on average across languages compared to OWSM v3.1
|
| 125 |
- FAMA is up to 8 times faster than Whisper large-v3 while achieving comparable performance
|
| 126 |
|
| 127 |
-
|
| 128 |
### Automatic Speech Recogniton (ASR)
|
| 129 |
| ***Model/Dataset WER (↓)*** | **CommonVoice**-*en* | **CommonVoice**-*it* | **MLS**-*en* | **MLS**-*it* | **VoxPopuli**-*en* | **VoxPopuli**-*it* | **AVG**-*en* | **AVG**-*it* |
|
| 130 |
|-----------------------------------------|---------|---------|---------|---------|---------|----------|---------|----------|
|
|
@@ -138,7 +138,6 @@ We also benchmark FAMA in terms of computational time and maximum batch size sup
|
|
| 138 |
| FAMA *small* | 13.7 | 8.6 | 5.8 | 12.8 | 7.3 | **15.6** | 8.9 | 12.3 |
|
| 139 |
| FAMA *medium* | 11.5 | 7.0 | 5.2 | 13.9 | 7.2 | 15.9 | 8.0 | 12.3 |
|
| 140 |
|
| 141 |
-
|
| 142 |
### Computational Time and Maximum Batch Size
|
| 143 |
|
| 144 |
| ***Model*** | ***Batch Size*** | ***xRTF en (↑)*** | ***xRTF it (↑)*** | ***xRTF AVG (↑)*** |
|
|
@@ -150,7 +149,6 @@ We also benchmark FAMA in terms of computational time and maximum batch size sup
|
|
| 150 |
| FAMA *small* | 16 | **57.4** | **56.0** | **56.7** |
|
| 151 |
| FAMA *medium* | 8 | 39.5 | 41.2 | 40.4 |
|
| 152 |
|
| 153 |
-
|
| 154 |
## License
|
| 155 |
|
| 156 |
We release the FAMA model weights, and training data under the CC-BY 4.0 license.
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
datasets:
|
| 3 |
- FBK-MT/mosel
|
| 4 |
- facebook/covost2
|
| 5 |
- openslr/librispeech_asr
|
| 6 |
- facebook/voxpopuli
|
| 7 |
+
language:
|
| 8 |
+
- en
|
| 9 |
+
- it
|
| 10 |
+
license: cc-by-4.0
|
| 11 |
metrics:
|
| 12 |
- wer
|
| 13 |
tags:
|
| 14 |
- speech
|
| 15 |
- speech recognition
|
| 16 |
- ASR
|
| 17 |
+
library_name: transformers
|
| 18 |
+
pipeline_tag: automatic-speech-recognition
|
| 19 |
---
|
| 20 |
|
| 21 |
# FAMA-small-asr
|
|
|
|
| 42 |
themself are [released under OS-compliant licenses](#license), promoting a more
|
| 43 |
responsible creation of models in our community.
|
| 44 |
|
|
|
|
| 45 |
It is available in 2 sizes, with 2 variants for ASR only:
|
| 46 |
|
| 47 |
- [FAMA-small](https://huggingface.co/FBK-MT/fama-small) - 475 million parameters
|
|
|
|
| 50 |
- [FAMA-medium-asr](https://huggingface.co/FBK-MT/fama-medium-asr) - 878 million parameters
|
| 51 |
|
| 52 |
For more information about FAMA, please check our [blog post](https://huggingface.co/blog/FAMA/release) and the [arXiv](https://arxiv.org/abs/2505.22759) preprint.
|
| 53 |
+
The code is available in the [Github repository](https://github.com/hlt-mt/FBK-fairseq).
|
| 54 |
|
| 55 |
## Usage
|
| 56 |
|
|
|
|
| 125 |
- FAMA achieves up to 4.2 WER improvement on average across languages compared to OWSM v3.1
|
| 126 |
- FAMA is up to 8 times faster than Whisper large-v3 while achieving comparable performance
|
| 127 |
|
|
|
|
| 128 |
### Automatic Speech Recogniton (ASR)
|
| 129 |
| ***Model/Dataset WER (↓)*** | **CommonVoice**-*en* | **CommonVoice**-*it* | **MLS**-*en* | **MLS**-*it* | **VoxPopuli**-*en* | **VoxPopuli**-*it* | **AVG**-*en* | **AVG**-*it* |
|
| 130 |
|-----------------------------------------|---------|---------|---------|---------|---------|----------|---------|----------|
|
|
|
|
| 138 |
| FAMA *small* | 13.7 | 8.6 | 5.8 | 12.8 | 7.3 | **15.6** | 8.9 | 12.3 |
|
| 139 |
| FAMA *medium* | 11.5 | 7.0 | 5.2 | 13.9 | 7.2 | 15.9 | 8.0 | 12.3 |
|
| 140 |
|
|
|
|
| 141 |
### Computational Time and Maximum Batch Size
|
| 142 |
|
| 143 |
| ***Model*** | ***Batch Size*** | ***xRTF en (↑)*** | ***xRTF it (↑)*** | ***xRTF AVG (↑)*** |
|
|
|
|
| 149 |
| FAMA *small* | 16 | **57.4** | **56.0** | **56.7** |
|
| 150 |
| FAMA *medium* | 8 | 39.5 | 41.2 | 40.4 |
|
| 151 |
|
|
|
|
| 152 |
## License
|
| 153 |
|
| 154 |
We release the FAMA model weights, and training data under the CC-BY 4.0 license.
|