speechbrain
/

asr-wav2vec2-transformer-aishell

@@ -6,6 +6,7 @@ tags:
 - CTC
 - Attention
 - Transformers
 - pytorch
 license: "apache-2.0"
 datasets:
@@ -18,10 +19,10 @@ metrics:
 <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
 <br/><br/>
-# Transformer for AISHELL (Mandarin Chinese)
 This repository provides all the necessary tools to perform automatic speech
-recognition from an end-to-end system pretrained on AISHELL (Mandarin Chinese)
 within SpeechBrain. For a better experience, we encourage you to learn more about
 [SpeechBrain](https://speechbrain.github.io).
@@ -29,7 +30,7 @@ The performance of the model is the following:
 | Release | Dev CER | Test CER | GPUs | Full Results |
 |:-------------:|:--------------:|:--------------:|:--------:|:--------:|
-| 05-03-21 | 5.60 | 6.04 | 2xV100 32GB | [Google Drive](https://drive.google.com/drive/folders/1zlTBib0XEwWeyhaXDXnkqtPsIBI18Uzs?usp=sharing)|
@@ -38,10 +39,10 @@ The performance of the model is the following:
 This ASR system is composed of 2 different but linked blocks:
 - Tokenizer (unigram) that transforms words into subword units and trained with
 the train transcriptions of LibriSpeech.
-- Acoustic model made of a transformer encoder and a joint decoder with CTC +
 transformer. Hence, the decoding also incorporates the CTC probabilities.
-To Train this system from scratch, [see our SpeechBrain recipe](https://github.com/speechbrain/speechbrain/tree/develop/recipes/AISHELL-1).
 ## Install SpeechBrain
@@ -59,17 +60,15 @@ Please notice that we encourage you to read our tutorials and learn more about
 ```python
 from speechbrain.pretrained import EncoderDecoderASR
-asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/asr-transformer-aishell", savedir="pretrained_models/asr-transformer-aishell")
-asr_model.transcribe_file("speechbrain/asr-transformer-aishell/example_mandarin.wav")
 ```
 ### Inference on GPU
 To perform inference on the GPU, add  `run_opts={"device":"cuda"}`  when calling the `from_hparams` method.
 ### Training
-The model was trained with SpeechBrain (Commit hash: '986a2175').
 To train it from scratch follow these steps:
 1. Clone SpeechBrain:
 ```bash
@@ -85,10 +84,10 @@ pip install -e .
 3. Run Training:
 ```bash
 cd recipes/AISHELL-1/ASR/transformer/
-python train.py hparams/train_ASR_transformer.yaml --data_folder=your_data_folder
 ```
-You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1QU18YoauzLOXueogspT0CgR5bqJ6zFfu?usp=sharing).
 ### Limitations
 The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.

 - CTC
 - Attention
 - Transformers
+- wav2vec2
 - pytorch
 license: "apache-2.0"
 datasets:
 <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
 <br/><br/>
+# Transformer for AISHELL + wav2vec2 (Mandarin Chinese)
 This repository provides all the necessary tools to perform automatic speech
+recognition from an end-to-end system pretrained on AISHELL +wav2vec2 (Mandarin Chinese)
 within SpeechBrain. For a better experience, we encourage you to learn more about
 [SpeechBrain](https://speechbrain.github.io).
 | Release | Dev CER | Test CER | GPUs | Full Results |
 |:-------------:|:--------------:|:--------------:|:--------:|:--------:|
+| 05-03-21 | 5.19 | 5.58 | 2xV100 32GB | [Google Drive](https://drive.google.com/drive/folders/1zlTBib0XEwWeyhaXDXnkqtPsIBI18Uzs?usp=sharing)|
 This ASR system is composed of 2 different but linked blocks:
 - Tokenizer (unigram) that transforms words into subword units and trained with
 the train transcriptions of LibriSpeech.
+- Acoustic model made of a wav2vec2 encoder and a joint decoder with CTC +
 transformer. Hence, the decoding also incorporates the CTC probabilities.
+To Train this system from scratch, [see our SpeechBrain recipe](https://github.com/speechbrain/speechbrain/tree/develop/recipes/AISHELL-1/ASR/transformer).
 ## Install SpeechBrain
 ```python
 from speechbrain.pretrained import EncoderDecoderASR
+asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/asr-wav2vec2-transformer-aishell", savedir="pretrained_models/asr-wav2vec2-transformer-aishell")
+asr_model.transcribe_file("speechbrain/asr-wav2vec2-transformer-aishell/example_mandarin.wav")
 ```
 ### Inference on GPU
 To perform inference on the GPU, add  `run_opts={"device":"cuda"}`  when calling the `from_hparams` method.
 ### Training
+The model was trained with SpeechBrain (Commit hash: '480dde87').
 To train it from scratch follow these steps:
 1. Clone SpeechBrain:
 ```bash
 3. Run Training:
 ```bash
 cd recipes/AISHELL-1/ASR/transformer/
+python train.py hparams/train_ASR_transformer_with_wav2vect.yaml --data_folder=your_data_folder
 ```
+You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1P3w5BnwLDxMHFQrkCZ5RYBZ1WsQHKFZr?usp=sharing).
 ### Limitations
 The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.