Automatic Speech Recognition
ESPnet
English
audio
eric102004 commited on
Commit
dbff434
·
1 Parent(s): c062330

Update model

Browse files
Files changed (23) hide show
  1. README.md +407 -3
  2. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/RESULTS.md +70 -0
  3. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/config.yaml +263 -0
  4. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/acc.png +0 -0
  5. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/backward_time.png +0 -0
  6. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/cer.png +0 -0
  7. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/cer_ctc.png +0 -0
  8. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/clip.png +0 -0
  9. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/forward_time.png +0 -0
  10. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/gpu_max_alloc_mem_GB.png +0 -0
  11. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/gpu_max_cached_mem_GB.png +0 -0
  12. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/grad_norm.png +0 -0
  13. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/iter_time.png +0 -0
  14. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/loss.png +0 -0
  15. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/loss_att.png +0 -0
  16. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/loss_ctc.png +0 -0
  17. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/loss_scale.png +0 -0
  18. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/optim0_lr0.png +0 -0
  19. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/optim_step_time.png +0 -0
  20. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/train_time.png +0 -0
  21. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/wer.png +0 -0
  22. exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/valid.cer_ctc.ave_10best.pth +3 -0
  23. meta.yaml +8 -0
README.md CHANGED
@@ -1,3 +1,407 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: en
7
+ datasets:
8
+ - myst_ogi_cmu_kids
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `espnet/myst_ogi_cmu_kids_ctc_nolm`
15
+
16
+ This model was trained by eric102004 using myst_ogi_cmu_kids recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
+ if you haven't done that already.
22
+
23
+ ```bash
24
+ cd espnet
25
+ git checkout 6f722aee1f9593572d5eddfd8cac7075b07cf9ca
26
+ pip install -e .
27
+ cd egs2/myst_ogi_cmu_kids/asr1
28
+ ./run.sh --skip_data_prep false --skip_train true --download_model espnet/myst_ogi_cmu_kids_ctc_nolm
29
+ ```
30
+
31
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
32
+ # RESULTS
33
+ ## Environments
34
+ - date: `Tue Feb 18 10:19:00 CST 2025`
35
+ - python version: `3.12.3 | packaged by Anaconda, Inc. | (main, May 6 2024, 19:46:43) [GCC 11.2.0]`
36
+ - espnet version: `espnet 202412`
37
+ - pytorch version: `pytorch 2.4.0`
38
+ - Git hash: `6f722aee1f9593572d5eddfd8cac7075b07cf9ca`
39
+ - Commit date: `Thu Feb 6 22:32:07 2025 -0600`
40
+
41
+ ## exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/decode_ctc_bs1_jibo_asr_model_valid.cer_ctc.ave_10best
42
+ ### WER
43
+
44
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
45
+ |---|---|---|---|---|---|---|---|---|
46
+ |data_jibo/dev|853|853|11.0|88.2|0.8|0.8|89.8|89.7|
47
+ |data_jibo/test|1044|1043|11.4|87.7|0.9|1.5|90.1|89.4|
48
+
49
+ ### CER
50
+
51
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
52
+ |---|---|---|---|---|---|---|---|---|
53
+ |data_jibo/dev|853|2014|21.4|23.1|55.4|2.5|81.0|89.7|
54
+ |data_jibo/test|1044|2767|19.9|21.5|58.6|2.1|82.2|89.4|
55
+
56
+ ### TER
57
+
58
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
59
+ |---|---|---|---|---|---|---|---|---|
60
+ ## exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/decode_ctc_bs1_asr_model_valid.cer_ctc.ave_10best
61
+ ### WER
62
+
63
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
64
+ |---|---|---|---|---|---|---|---|---|
65
+ |data_cmu/dev|237|2170|83.4|14.3|2.3|3.9|20.5|56.5|
66
+ |data_cmu/test|475|4287|81.9|15.1|3.0|3.7|21.8|59.2|
67
+ |data_jibo/dev|853|853|29.3|70.3|0.4|227.2|297.9|88.9|
68
+ |data_jibo/test|1044|1043|29.1|70.9|0.0|318.5|389.4|87.7|
69
+ |data_myst/dev|9037|153273|87.6|10.6|1.8|3.4|15.8|74.1|
70
+ |data_myst/test|10311|182712|87.5|10.6|1.9|3.5|16.0|72.0|
71
+ |data_ogi_scripted/dev|5426|15375|97.4|2.2|0.4|0.4|3.1|5.1|
72
+ |data_ogi_scripted/test|15945|45419|96.6|2.7|0.6|0.7|4.1|6.9|
73
+ |data_ogi_spon/dev|349|13561|74.4|22.0|3.6|4.1|29.6|97.7|
74
+ |data_ogi_spon/test|1095|38811|75.6|21.1|3.2|4.8|29.2|96.8|
75
+ |high_age/test|11196|56799|1.4|34.9|63.7|63.8|162.4|99.9|
76
+ |low_age/test|5147|24262|2.2|37.7|60.1|62.3|160.0|98.7|
77
+ |mid_age/test|26532|374547|11.2|45.6|43.2|44.7|133.6|98.8|
78
+
79
+ ### CER
80
+
81
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
82
+ |---|---|---|---|---|---|---|---|---|
83
+ |data_cmu/dev|237|11449|93.8|3.1|3.1|3.8|10.0|56.5|
84
+ |data_cmu/test|475|22664|93.1|3.0|3.9|3.3|10.3|59.2|
85
+ |data_jibo/dev|853|2014|69.2|28.0|2.8|472.7|503.5|88.9|
86
+ |data_jibo/test|1044|2767|73.9|24.1|2.0|574.8|600.9|87.7|
87
+ |data_myst/dev|9037|763728|95.9|2.1|2.1|3.3|7.4|74.1|
88
+ |data_myst/test|10311|911898|95.8|2.0|2.1|3.4|7.6|72.0|
89
+ |data_ogi_scripted/dev|5426|83141|98.5|0.6|0.9|0.5|2.0|5.1|
90
+ |data_ogi_scripted/test|15945|244467|98.2|0.7|1.1|0.8|2.5|6.9|
91
+ |data_ogi_spon/dev|349|58255|89.0|5.8|5.2|4.8|15.8|97.7|
92
+ |data_ogi_spon/test|1095|165977|89.6|5.6|4.8|5.4|15.8|96.8|
93
+ |high_age/test|11196|278522|18.8|20.2|61.1|60.5|141.8|99.9|
94
+ |low_age/test|5147|117778|20.8|22.1|57.1|58.0|137.2|98.7|
95
+ |mid_age/test|26532|1865279|34.6|20.1|45.3|46.4|111.8|98.8|
96
+
97
+ ### TER
98
+
99
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
100
+ |---|---|---|---|---|---|---|---|---|
101
+
102
+ ## ASR config
103
+
104
+ <details><summary>expand</summary>
105
+
106
+ ```
107
+ config: conf/tuning/train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002.yaml
108
+ print_config: false
109
+ log_level: INFO
110
+ drop_last_iter: false
111
+ dry_run: false
112
+ iterator_type: sequence
113
+ valid_iterator_type: null
114
+ output_dir: exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter
115
+ ngpu: 1
116
+ seed: 2022
117
+ num_workers: 4
118
+ num_att_plot: 3
119
+ dist_backend: nccl
120
+ dist_init_method: env://
121
+ dist_world_size: null
122
+ dist_rank: null
123
+ local_rank: 0
124
+ dist_master_addr: null
125
+ dist_master_port: null
126
+ dist_launcher: null
127
+ multiprocessing_distributed: false
128
+ unused_parameters: false
129
+ sharded_ddp: false
130
+ use_deepspeed: false
131
+ deepspeed_config: null
132
+ static_graph: false
133
+ gradient_as_bucket_view: false
134
+ broadcast_buffers: true
135
+ bucket_cap_mb: 25
136
+ compress_gradients: false
137
+ cudnn_enabled: true
138
+ cudnn_benchmark: false
139
+ cudnn_deterministic: true
140
+ use_tf32: false
141
+ collect_stats: false
142
+ write_collected_feats: false
143
+ max_epoch: 70
144
+ patience: null
145
+ val_scheduler_criterion:
146
+ - valid
147
+ - loss
148
+ early_stopping_criterion:
149
+ - valid
150
+ - loss
151
+ - min
152
+ best_model_criterion:
153
+ - - valid
154
+ - cer_ctc
155
+ - min
156
+ keep_nbest_models: 10
157
+ nbest_averaging_interval: 0
158
+ grad_clip: 5.0
159
+ grad_clip_type: 2.0
160
+ grad_noise: false
161
+ accum_grad: 4
162
+ no_forward_run: false
163
+ resume: true
164
+ train_dtype: float32
165
+ use_amp: true
166
+ log_interval: null
167
+ use_matplotlib: true
168
+ use_tensorboard: true
169
+ create_graph_in_tensorboard: false
170
+ use_wandb: false
171
+ wandb_project: null
172
+ wandb_id: null
173
+ wandb_entity: null
174
+ wandb_name: null
175
+ wandb_model_log_interval: -1
176
+ detect_anomaly: false
177
+ use_adapter: false
178
+ adapter: lora
179
+ save_strategy: all
180
+ adapter_conf: {}
181
+ pretrain_path: null
182
+ init_param: []
183
+ ignore_init_mismatch: false
184
+ freeze_param: []
185
+ num_iters_per_epoch: null
186
+ batch_size: 20
187
+ valid_batch_size: null
188
+ batch_bins: 16000000
189
+ valid_batch_bins: null
190
+ category_sample_size: 10
191
+ train_shape_file:
192
+ - exp/asr_stats_raw_en_char/train/speech_shape
193
+ - exp/asr_stats_raw_en_char/train/text_shape.char
194
+ valid_shape_file:
195
+ - exp/asr_stats_raw_en_char/valid/speech_shape
196
+ - exp/asr_stats_raw_en_char/valid/text_shape.char
197
+ batch_type: numel
198
+ valid_batch_type: null
199
+ fold_length:
200
+ - 80000
201
+ - 150
202
+ sort_in_batch: descending
203
+ shuffle_within_batch: false
204
+ sort_batch: descending
205
+ multiple_iterator: false
206
+ validate_each_iter_factory: true
207
+ chunk_length: 500
208
+ chunk_shift_ratio: 0.5
209
+ num_cache_chunks: 1024
210
+ chunk_excluded_key_prefixes: []
211
+ chunk_default_fs: null
212
+ chunk_max_abs_length: null
213
+ chunk_discard_short_samples: true
214
+ train_data_path_and_name_and_type:
215
+ - - dump/raw/train/wav.scp
216
+ - speech
217
+ - sound
218
+ - - dump/raw/train/text
219
+ - text
220
+ - text
221
+ valid_data_path_and_name_and_type:
222
+ - - dump/raw/dev/wav.scp
223
+ - speech
224
+ - sound
225
+ - - dump/raw/dev/text
226
+ - text
227
+ - text
228
+ multi_task_dataset: false
229
+ allow_variable_data_keys: false
230
+ max_cache_size: 0.0
231
+ max_cache_fd: 32
232
+ allow_multi_rates: false
233
+ valid_max_cache_size: null
234
+ exclude_weight_decay: false
235
+ exclude_weight_decay_conf: {}
236
+ optim: adam
237
+ optim_conf:
238
+ lr: 0.002
239
+ weight_decay: 1.0e-06
240
+ scheduler: warmuplr
241
+ scheduler_conf:
242
+ warmup_steps: 15000
243
+ token_list:
244
+ - <blank>
245
+ - <unk>
246
+ - <space>
247
+ - E
248
+ - T
249
+ - A
250
+ - O
251
+ - I
252
+ - N
253
+ - H
254
+ - S
255
+ - R
256
+ - L
257
+ - D
258
+ - U
259
+ - W
260
+ - M
261
+ - C
262
+ - G
263
+ - Y
264
+ - B
265
+ - P
266
+ - F
267
+ - K
268
+ - ''''
269
+ - V
270
+ - X
271
+ - J
272
+ - Z
273
+ - Q
274
+ - ','
275
+ - '-'
276
+ - <sos/eos>
277
+ init: null
278
+ input_size: null
279
+ ctc_conf:
280
+ dropout_rate: 0.0
281
+ ctc_type: builtin
282
+ reduce: true
283
+ ignore_nan_grad: null
284
+ zero_infinity: true
285
+ brctc_risk_strategy: exp
286
+ brctc_group_strategy: end
287
+ brctc_risk_factor: 0.0
288
+ joint_net_conf: null
289
+ use_preprocessor: true
290
+ use_lang_prompt: false
291
+ use_nlp_prompt: false
292
+ token_type: char
293
+ bpemodel: null
294
+ non_linguistic_symbols: null
295
+ cleaner: null
296
+ g2p: null
297
+ speech_volume_normalize: null
298
+ rir_scp: null
299
+ rir_apply_prob: 1.0
300
+ noise_scp: null
301
+ noise_apply_prob: 1.0
302
+ noise_db_range: '13_15'
303
+ short_noise_thres: 0.5
304
+ aux_ctc_tasks: []
305
+ frontend: default
306
+ frontend_conf:
307
+ n_fft: 512
308
+ win_length: 400
309
+ hop_length: 160
310
+ fs: 16k
311
+ specaug: specaug
312
+ specaug_conf:
313
+ apply_time_warp: true
314
+ time_warp_window: 5
315
+ time_warp_mode: bicubic
316
+ apply_freq_mask: true
317
+ freq_mask_width_range:
318
+ - 0
319
+ - 27
320
+ num_freq_mask: 2
321
+ apply_time_mask: true
322
+ time_mask_width_ratio_range:
323
+ - 0.0
324
+ - 0.05
325
+ num_time_mask: 5
326
+ normalize: utterance_mvn
327
+ normalize_conf: {}
328
+ model: espnet
329
+ model_conf:
330
+ ctc_weight: 1.0
331
+ lsm_weight: 0.1
332
+ length_normalized_loss: false
333
+ preencoder: null
334
+ preencoder_conf: {}
335
+ encoder: e_branchformer
336
+ encoder_conf:
337
+ output_size: 256
338
+ attention_heads: 4
339
+ attention_layer_type: rel_selfattn
340
+ pos_enc_layer_type: rel_pos
341
+ rel_pos_type: latest
342
+ cgmlp_linear_units: 1024
343
+ cgmlp_conv_kernel: 31
344
+ use_linear_after_conv: false
345
+ gate_activation: identity
346
+ num_blocks: 12
347
+ dropout_rate: 0.1
348
+ positional_dropout_rate: 0.1
349
+ attention_dropout_rate: 0.1
350
+ input_layer: conv2d
351
+ layer_drop_rate: 0.0
352
+ linear_units: 1024
353
+ positionwise_layer_type: linear
354
+ use_ffn: true
355
+ macaron_ffn: true
356
+ merge_conv_kernel: 31
357
+ postencoder: null
358
+ postencoder_conf: {}
359
+ decoder: null
360
+ decoder_conf: {}
361
+ preprocessor: default
362
+ preprocessor_conf: {}
363
+ masker: null
364
+ masker_conf: {}
365
+ required:
366
+ - output_dir
367
+ - token_list
368
+ version: '202412'
369
+ distributed: false
370
+ ```
371
+
372
+ </details>
373
+
374
+
375
+
376
+ ### Citing ESPnet
377
+
378
+ ```BibTex
379
+ @inproceedings{watanabe2018espnet,
380
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
381
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
382
+ year={2018},
383
+ booktitle={Proceedings of Interspeech},
384
+ pages={2207--2211},
385
+ doi={10.21437/Interspeech.2018-1456},
386
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
387
+ }
388
+
389
+
390
+
391
+
392
+
393
+
394
+ ```
395
+
396
+ or arXiv:
397
+
398
+ ```bibtex
399
+ @misc{watanabe2018espnet,
400
+ title={ESPnet: End-to-End Speech Processing Toolkit},
401
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
402
+ year={2018},
403
+ eprint={1804.00015},
404
+ archivePrefix={arXiv},
405
+ primaryClass={cs.CL}
406
+ }
407
+ ```
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/RESULTS.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Tue Feb 18 10:19:00 CST 2025`
5
+ - python version: `3.12.3 | packaged by Anaconda, Inc. | (main, May 6 2024, 19:46:43) [GCC 11.2.0]`
6
+ - espnet version: `espnet 202412`
7
+ - pytorch version: `pytorch 2.4.0`
8
+ - Git hash: `6f722aee1f9593572d5eddfd8cac7075b07cf9ca`
9
+ - Commit date: `Thu Feb 6 22:32:07 2025 -0600`
10
+
11
+ ## exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/decode_ctc_bs1_jibo_asr_model_valid.cer_ctc.ave_10best
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |data_jibo/dev|853|853|11.0|88.2|0.8|0.8|89.8|89.7|
17
+ |data_jibo/test|1044|1043|11.4|87.7|0.9|1.5|90.1|89.4|
18
+
19
+ ### CER
20
+
21
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
22
+ |---|---|---|---|---|---|---|---|---|
23
+ |data_jibo/dev|853|2014|21.4|23.1|55.4|2.5|81.0|89.7|
24
+ |data_jibo/test|1044|2767|19.9|21.5|58.6|2.1|82.2|89.4|
25
+
26
+ ### TER
27
+
28
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
29
+ |---|---|---|---|---|---|---|---|---|
30
+ ## exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/decode_ctc_bs1_asr_model_valid.cer_ctc.ave_10best
31
+ ### WER
32
+
33
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
34
+ |---|---|---|---|---|---|---|---|---|
35
+ |data_cmu/dev|237|2170|83.4|14.3|2.3|3.9|20.5|56.5|
36
+ |data_cmu/test|475|4287|81.9|15.1|3.0|3.7|21.8|59.2|
37
+ |data_jibo/dev|853|853|29.3|70.3|0.4|227.2|297.9|88.9|
38
+ |data_jibo/test|1044|1043|29.1|70.9|0.0|318.5|389.4|87.7|
39
+ |data_myst/dev|9037|153273|87.6|10.6|1.8|3.4|15.8|74.1|
40
+ |data_myst/test|10311|182712|87.5|10.6|1.9|3.5|16.0|72.0|
41
+ |data_ogi_scripted/dev|5426|15375|97.4|2.2|0.4|0.4|3.1|5.1|
42
+ |data_ogi_scripted/test|15945|45419|96.6|2.7|0.6|0.7|4.1|6.9|
43
+ |data_ogi_spon/dev|349|13561|74.4|22.0|3.6|4.1|29.6|97.7|
44
+ |data_ogi_spon/test|1095|38811|75.6|21.1|3.2|4.8|29.2|96.8|
45
+ |high_age/test|11196|56799|1.4|34.9|63.7|63.8|162.4|99.9|
46
+ |low_age/test|5147|24262|2.2|37.7|60.1|62.3|160.0|98.7|
47
+ |mid_age/test|26532|374547|11.2|45.6|43.2|44.7|133.6|98.8|
48
+
49
+ ### CER
50
+
51
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
52
+ |---|---|---|---|---|---|---|---|---|
53
+ |data_cmu/dev|237|11449|93.8|3.1|3.1|3.8|10.0|56.5|
54
+ |data_cmu/test|475|22664|93.1|3.0|3.9|3.3|10.3|59.2|
55
+ |data_jibo/dev|853|2014|69.2|28.0|2.8|472.7|503.5|88.9|
56
+ |data_jibo/test|1044|2767|73.9|24.1|2.0|574.8|600.9|87.7|
57
+ |data_myst/dev|9037|763728|95.9|2.1|2.1|3.3|7.4|74.1|
58
+ |data_myst/test|10311|911898|95.8|2.0|2.1|3.4|7.6|72.0|
59
+ |data_ogi_scripted/dev|5426|83141|98.5|0.6|0.9|0.5|2.0|5.1|
60
+ |data_ogi_scripted/test|15945|244467|98.2|0.7|1.1|0.8|2.5|6.9|
61
+ |data_ogi_spon/dev|349|58255|89.0|5.8|5.2|4.8|15.8|97.7|
62
+ |data_ogi_spon/test|1095|165977|89.6|5.6|4.8|5.4|15.8|96.8|
63
+ |high_age/test|11196|278522|18.8|20.2|61.1|60.5|141.8|99.9|
64
+ |low_age/test|5147|117778|20.8|22.1|57.1|58.0|137.2|98.7|
65
+ |mid_age/test|26532|1865279|34.6|20.1|45.3|46.4|111.8|98.8|
66
+
67
+ ### TER
68
+
69
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
70
+ |---|---|---|---|---|---|---|---|---|
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/config.yaml ADDED
@@ -0,0 +1,263 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ drop_last_iter: false
5
+ dry_run: false
6
+ iterator_type: sequence
7
+ valid_iterator_type: null
8
+ output_dir: exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter
9
+ ngpu: 1
10
+ seed: 2022
11
+ num_workers: 4
12
+ num_att_plot: 3
13
+ dist_backend: nccl
14
+ dist_init_method: env://
15
+ dist_world_size: null
16
+ dist_rank: null
17
+ local_rank: 0
18
+ dist_master_addr: null
19
+ dist_master_port: null
20
+ dist_launcher: null
21
+ multiprocessing_distributed: false
22
+ unused_parameters: false
23
+ sharded_ddp: false
24
+ use_deepspeed: false
25
+ deepspeed_config: null
26
+ static_graph: false
27
+ gradient_as_bucket_view: false
28
+ broadcast_buffers: true
29
+ bucket_cap_mb: 25
30
+ compress_gradients: false
31
+ cudnn_enabled: true
32
+ cudnn_benchmark: false
33
+ cudnn_deterministic: true
34
+ use_tf32: false
35
+ collect_stats: false
36
+ write_collected_feats: false
37
+ max_epoch: 70
38
+ patience: null
39
+ val_scheduler_criterion:
40
+ - valid
41
+ - loss
42
+ early_stopping_criterion:
43
+ - valid
44
+ - loss
45
+ - min
46
+ best_model_criterion:
47
+ - - valid
48
+ - cer_ctc
49
+ - min
50
+ keep_nbest_models: 10
51
+ nbest_averaging_interval: 0
52
+ grad_clip: 5.0
53
+ grad_clip_type: 2.0
54
+ grad_noise: false
55
+ accum_grad: 4
56
+ no_forward_run: false
57
+ resume: true
58
+ train_dtype: float32
59
+ use_amp: true
60
+ log_interval: null
61
+ use_matplotlib: true
62
+ use_tensorboard: true
63
+ create_graph_in_tensorboard: false
64
+ use_wandb: false
65
+ wandb_project: null
66
+ wandb_id: null
67
+ wandb_entity: null
68
+ wandb_name: null
69
+ wandb_model_log_interval: -1
70
+ detect_anomaly: false
71
+ use_adapter: false
72
+ adapter: lora
73
+ save_strategy: all
74
+ adapter_conf: {}
75
+ pretrain_path: null
76
+ init_param: []
77
+ ignore_init_mismatch: false
78
+ freeze_param: []
79
+ num_iters_per_epoch: null
80
+ batch_size: 20
81
+ valid_batch_size: null
82
+ batch_bins: 16000000
83
+ valid_batch_bins: null
84
+ category_sample_size: 10
85
+ train_shape_file:
86
+ - exp/asr_stats_raw_en_char/train/speech_shape
87
+ - exp/asr_stats_raw_en_char/train/text_shape.char
88
+ valid_shape_file:
89
+ - exp/asr_stats_raw_en_char/valid/speech_shape
90
+ - exp/asr_stats_raw_en_char/valid/text_shape.char
91
+ batch_type: numel
92
+ valid_batch_type: null
93
+ fold_length:
94
+ - 80000
95
+ - 150
96
+ sort_in_batch: descending
97
+ shuffle_within_batch: false
98
+ sort_batch: descending
99
+ multiple_iterator: false
100
+ validate_each_iter_factory: true
101
+ chunk_length: 500
102
+ chunk_shift_ratio: 0.5
103
+ num_cache_chunks: 1024
104
+ chunk_excluded_key_prefixes: []
105
+ chunk_default_fs: null
106
+ chunk_max_abs_length: null
107
+ chunk_discard_short_samples: true
108
+ train_data_path_and_name_and_type:
109
+ - - dump/raw/train/wav.scp
110
+ - speech
111
+ - sound
112
+ - - dump/raw/train/text
113
+ - text
114
+ - text
115
+ valid_data_path_and_name_and_type:
116
+ - - dump/raw/dev/wav.scp
117
+ - speech
118
+ - sound
119
+ - - dump/raw/dev/text
120
+ - text
121
+ - text
122
+ multi_task_dataset: false
123
+ allow_variable_data_keys: false
124
+ max_cache_size: 0.0
125
+ max_cache_fd: 32
126
+ allow_multi_rates: false
127
+ valid_max_cache_size: null
128
+ exclude_weight_decay: false
129
+ exclude_weight_decay_conf: {}
130
+ optim: adam
131
+ optim_conf:
132
+ lr: 0.002
133
+ weight_decay: 1.0e-06
134
+ scheduler: warmuplr
135
+ scheduler_conf:
136
+ warmup_steps: 15000
137
+ token_list:
138
+ - <blank>
139
+ - <unk>
140
+ - <space>
141
+ - E
142
+ - T
143
+ - A
144
+ - O
145
+ - I
146
+ - N
147
+ - H
148
+ - S
149
+ - R
150
+ - L
151
+ - D
152
+ - U
153
+ - W
154
+ - M
155
+ - C
156
+ - G
157
+ - Y
158
+ - B
159
+ - P
160
+ - F
161
+ - K
162
+ - ''''
163
+ - V
164
+ - X
165
+ - J
166
+ - Z
167
+ - Q
168
+ - ','
169
+ - '-'
170
+ - <sos/eos>
171
+ init: null
172
+ input_size: null
173
+ ctc_conf:
174
+ dropout_rate: 0.0
175
+ ctc_type: builtin
176
+ reduce: true
177
+ ignore_nan_grad: null
178
+ zero_infinity: true
179
+ brctc_risk_strategy: exp
180
+ brctc_group_strategy: end
181
+ brctc_risk_factor: 0.0
182
+ joint_net_conf: null
183
+ use_preprocessor: true
184
+ use_lang_prompt: false
185
+ use_nlp_prompt: false
186
+ token_type: char
187
+ bpemodel: null
188
+ non_linguistic_symbols: null
189
+ cleaner: null
190
+ g2p: null
191
+ speech_volume_normalize: null
192
+ rir_scp: null
193
+ rir_apply_prob: 1.0
194
+ noise_scp: null
195
+ noise_apply_prob: 1.0
196
+ noise_db_range: '13_15'
197
+ short_noise_thres: 0.5
198
+ aux_ctc_tasks: []
199
+ frontend: default
200
+ frontend_conf:
201
+ n_fft: 512
202
+ win_length: 400
203
+ hop_length: 160
204
+ fs: 16k
205
+ specaug: specaug
206
+ specaug_conf:
207
+ apply_time_warp: true
208
+ time_warp_window: 5
209
+ time_warp_mode: bicubic
210
+ apply_freq_mask: true
211
+ freq_mask_width_range:
212
+ - 0
213
+ - 27
214
+ num_freq_mask: 2
215
+ apply_time_mask: true
216
+ time_mask_width_ratio_range:
217
+ - 0.0
218
+ - 0.05
219
+ num_time_mask: 5
220
+ normalize: utterance_mvn
221
+ normalize_conf: {}
222
+ model: espnet
223
+ model_conf:
224
+ ctc_weight: 1.0
225
+ lsm_weight: 0.1
226
+ length_normalized_loss: false
227
+ preencoder: null
228
+ preencoder_conf: {}
229
+ encoder: e_branchformer
230
+ encoder_conf:
231
+ output_size: 256
232
+ attention_heads: 4
233
+ attention_layer_type: rel_selfattn
234
+ pos_enc_layer_type: rel_pos
235
+ rel_pos_type: latest
236
+ cgmlp_linear_units: 1024
237
+ cgmlp_conv_kernel: 31
238
+ use_linear_after_conv: false
239
+ gate_activation: identity
240
+ num_blocks: 12
241
+ dropout_rate: 0.1
242
+ positional_dropout_rate: 0.1
243
+ attention_dropout_rate: 0.1
244
+ input_layer: conv2d
245
+ layer_drop_rate: 0.0
246
+ linear_units: 1024
247
+ positionwise_layer_type: linear
248
+ use_ffn: true
249
+ macaron_ffn: true
250
+ merge_conv_kernel: 31
251
+ postencoder: null
252
+ postencoder_conf: {}
253
+ decoder: null
254
+ decoder_conf: {}
255
+ preprocessor: default
256
+ preprocessor_conf: {}
257
+ masker: null
258
+ masker_conf: {}
259
+ required:
260
+ - output_dir
261
+ - token_list
262
+ version: '202412'
263
+ distributed: false
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/acc.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/backward_time.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/cer.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/cer_ctc.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/clip.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/forward_time.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/gpu_max_alloc_mem_GB.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/gpu_max_cached_mem_GB.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/grad_norm.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/iter_time.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/loss.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/loss_att.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/loss_ctc.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/loss_scale.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/optim0_lr0.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/optim_step_time.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/train_time.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/images/wer.png ADDED
exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/valid.cer_ctc.ave_10best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f5accf8363ca6f03850d50ff85035a78bf67de90ebd0f45814a59c5c7a6b5566
3
+ size 100907247
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202412'
2
+ files:
3
+ asr_model_file: exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/valid.cer_ctc.ave_10best.pth
4
+ python: 3.12.3 | packaged by Anaconda, Inc. | (main, May 6 2024, 19:46:43) [GCC 11.2.0]
5
+ timestamp: 1756005685.972703
6
+ torch: 2.4.0
7
+ yaml_files:
8
+ asr_train_config: exp/asr_train_asr_ctc_e_branchformer_e12_mlp1024_linear1024_lr002_raw_en_char_dur05_filter/config.yaml