DeCRED-base
This is a 174M encoder-decoder Ebranchformer model trained with an decoder-centric regularization technique on 6,000 hours of open-source normalised English data.
It achieves Word Error Rates (WERs) comparable to openai/whisper-medium across multiple datasets with just 1/4 of the parameters.
Architecture details, training hyperparameters, and a description of the proposed technique will be added soon.
Disclaimer: The model currently produce insertions on utterances containing silence only, as it was previously not trained on such data. The fix will be added soon.
The model can be used with the pipeline
class to transcribe audio files of arbitrary length.
from transformers import pipeline
model_id = "BUT-FIT/DeCRED-base"
pipe = pipeline("automatic-speech-recognition", model=model_id, feature_extractor=model_id, trust_remote_code=True)
# In newer versions of transformers (>4.31.0), there is a bug in the pipeline inference type.
# The warning can be ignored.
pipe.type = "seq2seq"
# Run beam search decoding with joint CTC-attention scorer
result_beam = pipe("audio.wav")
# Run greedy decoding without joint CTC-attention scorer
pipe.model.generation_config.ctc_weight = 0.0
pipe.model.generation_config.num_beams = 1
result_greedy = pipe("audio.wav")
Citation
If you use DeCRED in your research, please cite the following paper:
@misc{polok2024improvingautomaticspeechrecognition,
title={Improving Automatic Speech Recognition with Decoder-Centric Regularisation in Encoder-Decoder Models},
author={Alexander Polok and Santosh Kesiraju and Karel Beneš and Lukáš Burget and Jan Černocký},
year={2024},
eprint={2410.17437},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/2410.17437},
}
- Downloads last month
- 45
Datasets used to train BUT-FIT/DeCRED-base
Collection including BUT-FIT/DeCRED-base
Paper for BUT-FIT/DeCRED-base
Evaluation results
- Test WER on LibriSpeech (clean)test set self-reported2.500
- Test WER on LibriSpeech (other)test set self-reported5.600
- Test WER on tedlium-v3test set self-reported6.300
- Test WER on Vox Populitest set self-reported7.300
- Test WER on Mozilla Common Voice 13.0test set self-reported12.100
- Test WER on FLEURStest set self-reported6.800
- Test WER on Switchboardself-reported6.800
- Test WER on Wall Street Journalself-reported1.300