metadata
			tags:
  - espnet
  - audio
  - speech-recognition
language: zh
datasets:
  - commonvoice
license: cc-by-4.0
ESPnet2 ASR model
	
		
	
	
		espnet/shihlun-asr-commonvoice-zh-TW
	
This model was trained by Shih-Lun Wu using the commonvoice recipe in espnet.
Demo: How to use in ESPnet2
cd espnet
pip install -e .
cd egs2/commonvoice/asr1
./asr.sh \
  --stage 1 \
  --stop_stage 13 \
  --nj 32 \
  --inference_nj 32 \
  --skip_train true \
  --train_set "train_zh_TW" \
  --valid_set "dev_zh_TW" \
  --test_sets "dev_zh_TW test_zh_TW" \
  --lang "zh_TW" \
  --local_data_opts "--lang zh-TW" \
  --speed_perturb_factors "0.9 1.0 1.1" \
  --lm_train_text "data/train_zh_TW/text" \
  --token_type bpe \
  --nbpe 2542 \
  --bpemode "unigram" \
  --bpe_train_text "data/train_zh_TW/text" \
  --use_lm false \
  --inference_asr_model "valid.acc.best.pth" \
  --download_model "espnet/shihlun-asr-commonvoice-zh-TW"
RESULTS
Environments
- date: Thu Sep 1 21:49:10 UTC 2022
- python version: 3.9.12 (main, Jun 1 2022, 11:38:51) [GCC 7.5.0]
- espnet version: espnet 202207
- pytorch version: pytorch 1.12.1+cu102
- Git hash: 13db69d3befc3c82a5ff5a11e28bf79d5030603f- Commit date: Mon Aug 29 13:44:35 2022 +0000
 
- Commit date: 
asr_train_asr_conformer5_raw_zh_TW_bpe2542_sp_lr1.0
CER
| dataset | Snt | Wrd | Corr | Sub | Del | Ins | Err | S.Err | 
|---|---|---|---|---|---|---|---|---|
| inference_asr_model_valid.acc.best/dev_zh_TW | 2627 | 22200 | 97.7 | 2.1 | 0.2 | 0.0 | 2.4 | 9.5 | 
| inference_asr_model_valid.acc.best/test_zh_TW | 2627 | 21991 | 98.0 | 1.6 | 0.4 | 0.1 | 2.1 | 7.7 | 
TER
| dataset | Snt | Wrd | Corr | Sub | Del | Ins | Err | S.Err | 
|---|---|---|---|---|---|---|---|---|
| inference_asr_model_valid.acc.best/dev_zh_TW | 2627 | 24827 | 98.6 | 1.2 | 0.2 | 0.0 | 1.5 | 4.0 | 
| inference_asr_model_valid.acc.best/test_zh_TW | 2627 | 24618 | 98.8 | 0.9 | 0.4 | 0.1 | 1.3 | 3.4 | 
