FBK-MT
/

fama-small-asr

Automatic Speech Recognition

Transformers

Safetensors

English

Italian

conformer_encoder_decoder

Model card Files Files and versions

xet

Community

spapi

nielsr HF Staff commited on Jun 4

Commit

2618e6b

verified ·

1 Parent(s): ae05d29

Add pipeline tag, library name and link to Github repo (#1)

Browse files

- Add pipeline tag, library name and link to Github repo (405534c1def32542f94cb3e825bd9fc9c126f31e)

Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show

README.md +7 -9

README.md CHANGED Viewed

@@ -1,19 +1,21 @@
 ---
-license: cc-by-4.0
-language:
-- en
-- it
 datasets:
 - FBK-MT/mosel
 - facebook/covost2
 - openslr/librispeech_asr
 - facebook/voxpopuli
 metrics:
 - wer
 tags:
 - speech
 - speech recognition
 - ASR
 ---
 # FAMA-small-asr
@@ -40,7 +42,6 @@ All the artifacts used for realizing FAMA models, including codebase, datasets,
 themself are [released under OS-compliant licenses](#license), promoting a more
 responsible creation of models in our community.
 It is available in 2 sizes, with 2 variants for ASR only:
 - [FAMA-small](https://huggingface.co/FBK-MT/fama-small) - 475 million parameters
@@ -49,7 +50,7 @@ It is available in 2 sizes, with 2 variants for ASR only:
 - [FAMA-medium-asr](https://huggingface.co/FBK-MT/fama-medium-asr) - 878 million parameters
 For more information about FAMA, please check our [blog post](https://huggingface.co/blog/FAMA/release) and the [arXiv](https://arxiv.org/abs/2505.22759) preprint.
 ## Usage
@@ -124,7 +125,6 @@ We also benchmark FAMA in terms of computational time and maximum batch size sup
 - FAMA achieves up to 4.2 WER improvement on average across languages compared to OWSM v3.1
 - FAMA is up to 8 times faster than Whisper large-v3 while achieving comparable performance
 ### Automatic Speech Recogniton (ASR)
 | ***Model/Dataset WER (↓)***             | **CommonVoice**-*en* | **CommonVoice**-*it* | **MLS**-*en* | **MLS**-*it* | **VoxPopuli**-*en* | **VoxPopuli**-*it* | **AVG**-*en* | **AVG**-*it* |
 |-----------------------------------------|---------|---------|---------|---------|---------|----------|---------|----------|
@@ -138,7 +138,6 @@ We also benchmark FAMA in terms of computational time and maximum batch size sup
 | FAMA *small*                            | 13.7    | 8.6     | 5.8     | 12.8    | 7.3     | **15.6** | 8.9     | 12.3     |
 | FAMA *medium*                           | 11.5    | 7.0     | 5.2     | 13.9    | 7.2     | 15.9     | 8.0     | 12.3     |
 ### Computational Time and Maximum Batch Size
 | ***Model***            | ***Batch Size*** | ***xRTF en (↑)*** | ***xRTF it (↑)*** | ***xRTF AVG (↑)*** |
@@ -150,7 +149,6 @@ We also benchmark FAMA in terms of computational time and maximum batch size sup
 | FAMA *small*           | 16         | **57.4**    | **56.0**    | **56.7**     |
 | FAMA *medium*          | 8          | 39.5        | 41.2        | 40.4         |
 ## License
 We release the FAMA model weights, and training data under the CC-BY 4.0 license.

 ---
 datasets:
 - FBK-MT/mosel
 - facebook/covost2
 - openslr/librispeech_asr
 - facebook/voxpopuli
+language:
+- en
+- it
+license: cc-by-4.0
 metrics:
 - wer
 tags:
 - speech
 - speech recognition
 - ASR
+library_name: transformers
+pipeline_tag: automatic-speech-recognition
 ---
 # FAMA-small-asr
 themself are [released under OS-compliant licenses](#license), promoting a more
 responsible creation of models in our community.
 It is available in 2 sizes, with 2 variants for ASR only:
 - [FAMA-small](https://huggingface.co/FBK-MT/fama-small) - 475 million parameters
 - [FAMA-medium-asr](https://huggingface.co/FBK-MT/fama-medium-asr) - 878 million parameters
 For more information about FAMA, please check our [blog post](https://huggingface.co/blog/FAMA/release) and the [arXiv](https://arxiv.org/abs/2505.22759) preprint.
+The code is available in the [Github repository](https://github.com/hlt-mt/FBK-fairseq).
 ## Usage
 - FAMA achieves up to 4.2 WER improvement on average across languages compared to OWSM v3.1
 - FAMA is up to 8 times faster than Whisper large-v3 while achieving comparable performance
 ### Automatic Speech Recogniton (ASR)
 | ***Model/Dataset WER (↓)***             | **CommonVoice**-*en* | **CommonVoice**-*it* | **MLS**-*en* | **MLS**-*it* | **VoxPopuli**-*en* | **VoxPopuli**-*it* | **AVG**-*en* | **AVG**-*it* |
 |-----------------------------------------|---------|---------|---------|---------|---------|----------|---------|----------|
 | FAMA *small*                            | 13.7    | 8.6     | 5.8     | 12.8    | 7.3     | **15.6** | 8.9     | 12.3     |
 | FAMA *medium*                           | 11.5    | 7.0     | 5.2     | 13.9    | 7.2     | 15.9     | 8.0     | 12.3     |
 ### Computational Time and Maximum Batch Size
 | ***Model***            | ***Batch Size*** | ***xRTF en (↑)*** | ***xRTF it (↑)*** | ***xRTF AVG (↑)*** |
 | FAMA *small*           | 16         | **57.4**    | **56.0**    | **56.7**     |
 | FAMA *medium*          | 8          | 39.5        | 41.2        | 40.4         |
 ## License
 We release the FAMA model weights, and training data under the CC-BY 4.0 license.