eric102004 commited on
Commit
7fb9ef8
·
1 Parent(s): 88a8543

Update model

Browse files
Files changed (23) hide show
  1. README.md +390 -3
  2. exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/RESULTS.md +45 -0
  3. exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/config.yaml +271 -0
  4. exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/acc.png +0 -0
  5. exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/backward_time.png +0 -0
  6. exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/cer.png +0 -0
  7. exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/cer_ctc.png +0 -0
  8. exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/clip.png +0 -0
  9. exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/forward_time.png +0 -0
  10. exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/gpu_max_alloc_mem_GB.png +0 -0
  11. exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/gpu_max_cached_mem_GB.png +0 -0
  12. exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/grad_norm.png +0 -0
  13. exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/iter_time.png +0 -0
  14. exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/loss.png +0 -0
  15. exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/loss_att.png +0 -0
  16. exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/loss_ctc.png +0 -0
  17. exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/loss_scale.png +0 -0
  18. exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/optim0_lr0.png +0 -0
  19. exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/optim_step_time.png +0 -0
  20. exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/train_time.png +0 -0
  21. exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/wer.png +0 -0
  22. exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/valid.cer.ave_10best.pth +3 -0
  23. meta.yaml +8 -0
README.md CHANGED
@@ -1,3 +1,390 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: en
7
+ datasets:
8
+ - myst_ogi_cmu_kids
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `espnet/myst_ogi_cmu_kids_aed_upsample`
15
+
16
+ This model was trained by eric102004 using myst_ogi_cmu_kids recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
+ if you haven't done that already.
22
+
23
+ ```bash
24
+ cd espnet
25
+ git checkout 6f722aee1f9593572d5eddfd8cac7075b07cf9ca
26
+ pip install -e .
27
+ cd egs2/myst_ogi_cmu_kids/asr1
28
+ ./run.sh --skip_data_prep false --skip_train true --download_model espnet/myst_ogi_cmu_kids_aed_upsample
29
+ ```
30
+
31
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
32
+ # RESULTS
33
+ ## Environments
34
+ - date: `Wed Feb 19 19:12:32 CST 2025`
35
+ - python version: `3.12.3 | packaged by Anaconda, Inc. | (main, May 6 2024, 19:46:43) [GCC 11.2.0]`
36
+ - espnet version: `espnet 202412`
37
+ - pytorch version: `pytorch 2.4.0`
38
+ - Git hash: `6f722aee1f9593572d5eddfd8cac7075b07cf9ca`
39
+ - Commit date: `Thu Feb 6 22:32:07 2025 -0600`
40
+
41
+ ## exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/decode_asr_asr_model_valid.cer.ave_10best
42
+ ### WER
43
+
44
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
45
+ |---|---|---|---|---|---|---|---|---|
46
+ |data_cmu/dev|237|2170|89.1|8.7|2.2|2.4|13.3|37.1|
47
+ |data_cmu/test|475|4287|88.3|9.0|2.6|2.4|14.0|40.0|
48
+ |data_jibo/dev|853|853|19.1|80.9|0.0|219.0|299.9|97.4|
49
+ |data_jibo/test|1044|1044|20.4|79.6|0.0|306.0|385.6|94.1|
50
+ |data_myst/dev|9037|153273|90.4|7.7|1.8|2.9|12.5|66.8|
51
+ |data_myst/test|10311|182712|89.9|7.8|2.3|3.1|13.1|65.0|
52
+ |data_ogi_scripted/dev|5426|15375|98.7|1.1|0.2|0.2|1.5|2.2|
53
+ |data_ogi_scripted/test|15945|45419|98.5|1.2|0.3|0.3|1.9|2.7|
54
+ |data_ogi_spon/dev|349|13561|81.0|15.2|3.8|3.2|22.2|96.6|
55
+ |data_ogi_spon/test|1095|38811|81.8|14.9|3.3|3.8|22.0|95.3|
56
+
57
+ ### CER
58
+
59
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
60
+ |---|---|---|---|---|---|---|---|---|
61
+ |data_cmu/dev|237|11449|94.9|2.5|2.6|2.4|7.4|37.1|
62
+ |data_cmu/test|475|22664|94.4|2.3|3.3|2.2|7.8|40.0|
63
+ |data_jibo/dev|853|2558|55.8|33.3|10.9|362.2|406.4|97.4|
64
+ |data_jibo/test|1044|3259|62.1|29.5|8.4|498.7|536.6|94.1|
65
+ |data_myst/dev|9037|763728|96.2|1.8|2.0|2.9|6.7|66.8|
66
+ |data_myst/test|10311|911898|95.8|1.8|2.4|3.1|7.3|65.0|
67
+ |data_ogi_scripted/dev|5426|83141|99.0|0.5|0.4|0.3|1.2|2.2|
68
+ |data_ogi_scripted/test|15945|244467|98.8|0.6|0.5|0.4|1.5|2.7|
69
+ |data_ogi_spon/dev|349|58255|90.3|4.8|4.9|3.7|13.4|96.6|
70
+ |data_ogi_spon/test|1095|165977|90.9|4.6|4.5|4.3|13.4|95.3|
71
+
72
+ ### TER
73
+
74
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
75
+ |---|---|---|---|---|---|---|---|---|
76
+
77
+ ## ASR config
78
+
79
+ <details><summary>expand</summary>
80
+
81
+ ```
82
+ config: conf/tuning/train_asr_lr002.yaml
83
+ print_config: false
84
+ log_level: INFO
85
+ drop_last_iter: false
86
+ dry_run: false
87
+ iterator_type: sequence
88
+ valid_iterator_type: null
89
+ output_dir: exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4
90
+ ngpu: 1
91
+ seed: 2022
92
+ num_workers: 4
93
+ num_att_plot: 3
94
+ dist_backend: nccl
95
+ dist_init_method: env://
96
+ dist_world_size: null
97
+ dist_rank: null
98
+ local_rank: 0
99
+ dist_master_addr: null
100
+ dist_master_port: null
101
+ dist_launcher: null
102
+ multiprocessing_distributed: false
103
+ unused_parameters: false
104
+ sharded_ddp: false
105
+ use_deepspeed: false
106
+ deepspeed_config: null
107
+ static_graph: false
108
+ gradient_as_bucket_view: false
109
+ broadcast_buffers: true
110
+ bucket_cap_mb: 25
111
+ compress_gradients: false
112
+ cudnn_enabled: true
113
+ cudnn_benchmark: false
114
+ cudnn_deterministic: true
115
+ use_tf32: false
116
+ collect_stats: false
117
+ write_collected_feats: false
118
+ max_epoch: 70
119
+ patience: null
120
+ val_scheduler_criterion:
121
+ - valid
122
+ - loss
123
+ early_stopping_criterion:
124
+ - valid
125
+ - loss
126
+ - min
127
+ best_model_criterion:
128
+ - - valid
129
+ - cer
130
+ - min
131
+ keep_nbest_models: 10
132
+ nbest_averaging_interval: 0
133
+ grad_clip: 5.0
134
+ grad_clip_type: 2.0
135
+ grad_noise: false
136
+ accum_grad: 4
137
+ no_forward_run: false
138
+ resume: true
139
+ train_dtype: float32
140
+ use_amp: true
141
+ log_interval: null
142
+ use_matplotlib: true
143
+ use_tensorboard: true
144
+ create_graph_in_tensorboard: false
145
+ use_wandb: false
146
+ wandb_project: null
147
+ wandb_id: null
148
+ wandb_entity: null
149
+ wandb_name: null
150
+ wandb_model_log_interval: -1
151
+ detect_anomaly: false
152
+ use_adapter: false
153
+ adapter: lora
154
+ save_strategy: all
155
+ adapter_conf: {}
156
+ pretrain_path: null
157
+ init_param: []
158
+ ignore_init_mismatch: false
159
+ freeze_param: []
160
+ num_iters_per_epoch: null
161
+ batch_size: 20
162
+ valid_batch_size: null
163
+ batch_bins: 16000000
164
+ valid_batch_bins: null
165
+ category_sample_size: 10
166
+ train_shape_file:
167
+ - exp/asr_stats_raw_en_char_ba_h2l4/train/speech_shape
168
+ - exp/asr_stats_raw_en_char_ba_h2l4/train/text_shape.char
169
+ valid_shape_file:
170
+ - exp/asr_stats_raw_en_char_ba_h2l4/valid/speech_shape
171
+ - exp/asr_stats_raw_en_char_ba_h2l4/valid/text_shape.char
172
+ batch_type: numel
173
+ valid_batch_type: null
174
+ fold_length:
175
+ - 80000
176
+ - 150
177
+ sort_in_batch: descending
178
+ shuffle_within_batch: false
179
+ sort_batch: descending
180
+ multiple_iterator: false
181
+ validate_each_iter_factory: true
182
+ chunk_length: 500
183
+ chunk_shift_ratio: 0.5
184
+ num_cache_chunks: 1024
185
+ chunk_excluded_key_prefixes: []
186
+ chunk_default_fs: null
187
+ chunk_max_abs_length: null
188
+ chunk_discard_short_samples: true
189
+ train_data_path_and_name_and_type:
190
+ - - dump_no_special_ba_h2l4/raw/train/wav.scp
191
+ - speech
192
+ - sound
193
+ - - dump_no_special_ba_h2l4/raw/train/text
194
+ - text
195
+ - text
196
+ valid_data_path_and_name_and_type:
197
+ - - dump_no_special_ba_h2l4/raw/dev/wav.scp
198
+ - speech
199
+ - sound
200
+ - - dump_no_special_ba_h2l4/raw/dev/text
201
+ - text
202
+ - text
203
+ multi_task_dataset: false
204
+ allow_variable_data_keys: false
205
+ max_cache_size: 0.0
206
+ max_cache_fd: 32
207
+ allow_multi_rates: false
208
+ valid_max_cache_size: null
209
+ exclude_weight_decay: false
210
+ exclude_weight_decay_conf: {}
211
+ optim: adam
212
+ optim_conf:
213
+ lr: 0.002
214
+ weight_decay: 1.0e-06
215
+ scheduler: warmuplr
216
+ scheduler_conf:
217
+ warmup_steps: 15000
218
+ token_list:
219
+ - <blank>
220
+ - <unk>
221
+ - <space>
222
+ - E
223
+ - T
224
+ - A
225
+ - O
226
+ - I
227
+ - N
228
+ - H
229
+ - S
230
+ - R
231
+ - L
232
+ - D
233
+ - U
234
+ - W
235
+ - M
236
+ - C
237
+ - G
238
+ - Y
239
+ - B
240
+ - P
241
+ - F
242
+ - K
243
+ - ''''
244
+ - V
245
+ - X
246
+ - J
247
+ - Z
248
+ - Q
249
+ - ','
250
+ - '-'
251
+ - <sos/eos>
252
+ init: null
253
+ input_size: null
254
+ ctc_conf:
255
+ dropout_rate: 0.0
256
+ ctc_type: builtin
257
+ reduce: true
258
+ ignore_nan_grad: null
259
+ zero_infinity: true
260
+ brctc_risk_strategy: exp
261
+ brctc_group_strategy: end
262
+ brctc_risk_factor: 0.0
263
+ joint_net_conf: null
264
+ use_preprocessor: true
265
+ use_lang_prompt: false
266
+ use_nlp_prompt: false
267
+ token_type: char
268
+ bpemodel: null
269
+ non_linguistic_symbols: null
270
+ cleaner: null
271
+ g2p: null
272
+ speech_volume_normalize: null
273
+ rir_scp: null
274
+ rir_apply_prob: 1.0
275
+ noise_scp: null
276
+ noise_apply_prob: 1.0
277
+ noise_db_range: '13_15'
278
+ short_noise_thres: 0.5
279
+ aux_ctc_tasks: []
280
+ frontend: default
281
+ frontend_conf:
282
+ n_fft: 512
283
+ win_length: 400
284
+ hop_length: 160
285
+ fs: 16k
286
+ specaug: specaug
287
+ specaug_conf:
288
+ apply_time_warp: true
289
+ time_warp_window: 5
290
+ time_warp_mode: bicubic
291
+ apply_freq_mask: true
292
+ freq_mask_width_range:
293
+ - 0
294
+ - 27
295
+ num_freq_mask: 2
296
+ apply_time_mask: true
297
+ time_mask_width_ratio_range:
298
+ - 0.0
299
+ - 0.05
300
+ num_time_mask: 5
301
+ normalize: utterance_mvn
302
+ normalize_conf: {}
303
+ model: espnet
304
+ model_conf:
305
+ ctc_weight: 0.3
306
+ lsm_weight: 0.1
307
+ length_normalized_loss: false
308
+ preencoder: null
309
+ preencoder_conf: {}
310
+ encoder: e_branchformer
311
+ encoder_conf:
312
+ output_size: 256
313
+ attention_heads: 4
314
+ attention_layer_type: rel_selfattn
315
+ pos_enc_layer_type: rel_pos
316
+ rel_pos_type: latest
317
+ cgmlp_linear_units: 1024
318
+ cgmlp_conv_kernel: 31
319
+ use_linear_after_conv: false
320
+ gate_activation: identity
321
+ num_blocks: 12
322
+ dropout_rate: 0.1
323
+ positional_dropout_rate: 0.1
324
+ attention_dropout_rate: 0.1
325
+ input_layer: conv2d
326
+ layer_drop_rate: 0.0
327
+ linear_units: 1024
328
+ positionwise_layer_type: linear
329
+ use_ffn: true
330
+ macaron_ffn: true
331
+ merge_conv_kernel: 31
332
+ postencoder: null
333
+ postencoder_conf: {}
334
+ decoder: transformer
335
+ decoder_conf:
336
+ attention_heads: 4
337
+ linear_units: 2048
338
+ num_blocks: 6
339
+ dropout_rate: 0.1
340
+ positional_dropout_rate: 0.1
341
+ self_attention_dropout_rate: 0.1
342
+ src_attention_dropout_rate: 0.1
343
+ layer_drop_rate: 0.0
344
+ preprocessor: default
345
+ preprocessor_conf: {}
346
+ masker: null
347
+ masker_conf: {}
348
+ required:
349
+ - output_dir
350
+ - token_list
351
+ version: '202412'
352
+ distributed: false
353
+ ```
354
+
355
+ </details>
356
+
357
+
358
+
359
+ ### Citing ESPnet
360
+
361
+ ```BibTex
362
+ @inproceedings{watanabe2018espnet,
363
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
364
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
365
+ year={2018},
366
+ booktitle={Proceedings of Interspeech},
367
+ pages={2207--2211},
368
+ doi={10.21437/Interspeech.2018-1456},
369
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
370
+ }
371
+
372
+
373
+
374
+
375
+
376
+
377
+ ```
378
+
379
+ or arXiv:
380
+
381
+ ```bibtex
382
+ @misc{watanabe2018espnet,
383
+ title={ESPnet: End-to-End Speech Processing Toolkit},
384
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
385
+ year={2018},
386
+ eprint={1804.00015},
387
+ archivePrefix={arXiv},
388
+ primaryClass={cs.CL}
389
+ }
390
+ ```
exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/RESULTS.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Wed Feb 19 19:12:32 CST 2025`
5
+ - python version: `3.12.3 | packaged by Anaconda, Inc. | (main, May 6 2024, 19:46:43) [GCC 11.2.0]`
6
+ - espnet version: `espnet 202412`
7
+ - pytorch version: `pytorch 2.4.0`
8
+ - Git hash: `6f722aee1f9593572d5eddfd8cac7075b07cf9ca`
9
+ - Commit date: `Thu Feb 6 22:32:07 2025 -0600`
10
+
11
+ ## exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/decode_asr_asr_model_valid.cer.ave_10best
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |data_cmu/dev|237|2170|89.1|8.7|2.2|2.4|13.3|37.1|
17
+ |data_cmu/test|475|4287|88.3|9.0|2.6|2.4|14.0|40.0|
18
+ |data_jibo/dev|853|853|19.1|80.9|0.0|219.0|299.9|97.4|
19
+ |data_jibo/test|1044|1044|20.4|79.6|0.0|306.0|385.6|94.1|
20
+ |data_myst/dev|9037|153273|90.4|7.7|1.8|2.9|12.5|66.8|
21
+ |data_myst/test|10311|182712|89.9|7.8|2.3|3.1|13.1|65.0|
22
+ |data_ogi_scripted/dev|5426|15375|98.7|1.1|0.2|0.2|1.5|2.2|
23
+ |data_ogi_scripted/test|15945|45419|98.5|1.2|0.3|0.3|1.9|2.7|
24
+ |data_ogi_spon/dev|349|13561|81.0|15.2|3.8|3.2|22.2|96.6|
25
+ |data_ogi_spon/test|1095|38811|81.8|14.9|3.3|3.8|22.0|95.3|
26
+
27
+ ### CER
28
+
29
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
30
+ |---|---|---|---|---|---|---|---|---|
31
+ |data_cmu/dev|237|11449|94.9|2.5|2.6|2.4|7.4|37.1|
32
+ |data_cmu/test|475|22664|94.4|2.3|3.3|2.2|7.8|40.0|
33
+ |data_jibo/dev|853|2558|55.8|33.3|10.9|362.2|406.4|97.4|
34
+ |data_jibo/test|1044|3259|62.1|29.5|8.4|498.7|536.6|94.1|
35
+ |data_myst/dev|9037|763728|96.2|1.8|2.0|2.9|6.7|66.8|
36
+ |data_myst/test|10311|911898|95.8|1.8|2.4|3.1|7.3|65.0|
37
+ |data_ogi_scripted/dev|5426|83141|99.0|0.5|0.4|0.3|1.2|2.2|
38
+ |data_ogi_scripted/test|15945|244467|98.8|0.6|0.5|0.4|1.5|2.7|
39
+ |data_ogi_spon/dev|349|58255|90.3|4.8|4.9|3.7|13.4|96.6|
40
+ |data_ogi_spon/test|1095|165977|90.9|4.6|4.5|4.3|13.4|95.3|
41
+
42
+ ### TER
43
+
44
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
45
+ |---|---|---|---|---|---|---|---|---|
exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/config.yaml ADDED
@@ -0,0 +1,271 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_asr_lr002.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ drop_last_iter: false
5
+ dry_run: false
6
+ iterator_type: sequence
7
+ valid_iterator_type: null
8
+ output_dir: exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4
9
+ ngpu: 1
10
+ seed: 2022
11
+ num_workers: 4
12
+ num_att_plot: 3
13
+ dist_backend: nccl
14
+ dist_init_method: env://
15
+ dist_world_size: null
16
+ dist_rank: null
17
+ local_rank: 0
18
+ dist_master_addr: null
19
+ dist_master_port: null
20
+ dist_launcher: null
21
+ multiprocessing_distributed: false
22
+ unused_parameters: false
23
+ sharded_ddp: false
24
+ use_deepspeed: false
25
+ deepspeed_config: null
26
+ static_graph: false
27
+ gradient_as_bucket_view: false
28
+ broadcast_buffers: true
29
+ bucket_cap_mb: 25
30
+ compress_gradients: false
31
+ cudnn_enabled: true
32
+ cudnn_benchmark: false
33
+ cudnn_deterministic: true
34
+ use_tf32: false
35
+ collect_stats: false
36
+ write_collected_feats: false
37
+ max_epoch: 70
38
+ patience: null
39
+ val_scheduler_criterion:
40
+ - valid
41
+ - loss
42
+ early_stopping_criterion:
43
+ - valid
44
+ - loss
45
+ - min
46
+ best_model_criterion:
47
+ - - valid
48
+ - cer
49
+ - min
50
+ keep_nbest_models: 10
51
+ nbest_averaging_interval: 0
52
+ grad_clip: 5.0
53
+ grad_clip_type: 2.0
54
+ grad_noise: false
55
+ accum_grad: 4
56
+ no_forward_run: false
57
+ resume: true
58
+ train_dtype: float32
59
+ use_amp: true
60
+ log_interval: null
61
+ use_matplotlib: true
62
+ use_tensorboard: true
63
+ create_graph_in_tensorboard: false
64
+ use_wandb: false
65
+ wandb_project: null
66
+ wandb_id: null
67
+ wandb_entity: null
68
+ wandb_name: null
69
+ wandb_model_log_interval: -1
70
+ detect_anomaly: false
71
+ use_adapter: false
72
+ adapter: lora
73
+ save_strategy: all
74
+ adapter_conf: {}
75
+ pretrain_path: null
76
+ init_param: []
77
+ ignore_init_mismatch: false
78
+ freeze_param: []
79
+ num_iters_per_epoch: null
80
+ batch_size: 20
81
+ valid_batch_size: null
82
+ batch_bins: 16000000
83
+ valid_batch_bins: null
84
+ category_sample_size: 10
85
+ train_shape_file:
86
+ - exp/asr_stats_raw_en_char_ba_h2l4/train/speech_shape
87
+ - exp/asr_stats_raw_en_char_ba_h2l4/train/text_shape.char
88
+ valid_shape_file:
89
+ - exp/asr_stats_raw_en_char_ba_h2l4/valid/speech_shape
90
+ - exp/asr_stats_raw_en_char_ba_h2l4/valid/text_shape.char
91
+ batch_type: numel
92
+ valid_batch_type: null
93
+ fold_length:
94
+ - 80000
95
+ - 150
96
+ sort_in_batch: descending
97
+ shuffle_within_batch: false
98
+ sort_batch: descending
99
+ multiple_iterator: false
100
+ validate_each_iter_factory: true
101
+ chunk_length: 500
102
+ chunk_shift_ratio: 0.5
103
+ num_cache_chunks: 1024
104
+ chunk_excluded_key_prefixes: []
105
+ chunk_default_fs: null
106
+ chunk_max_abs_length: null
107
+ chunk_discard_short_samples: true
108
+ train_data_path_and_name_and_type:
109
+ - - dump_no_special_ba_h2l4/raw/train/wav.scp
110
+ - speech
111
+ - sound
112
+ - - dump_no_special_ba_h2l4/raw/train/text
113
+ - text
114
+ - text
115
+ valid_data_path_and_name_and_type:
116
+ - - dump_no_special_ba_h2l4/raw/dev/wav.scp
117
+ - speech
118
+ - sound
119
+ - - dump_no_special_ba_h2l4/raw/dev/text
120
+ - text
121
+ - text
122
+ multi_task_dataset: false
123
+ allow_variable_data_keys: false
124
+ max_cache_size: 0.0
125
+ max_cache_fd: 32
126
+ allow_multi_rates: false
127
+ valid_max_cache_size: null
128
+ exclude_weight_decay: false
129
+ exclude_weight_decay_conf: {}
130
+ optim: adam
131
+ optim_conf:
132
+ lr: 0.002
133
+ weight_decay: 1.0e-06
134
+ scheduler: warmuplr
135
+ scheduler_conf:
136
+ warmup_steps: 15000
137
+ token_list:
138
+ - <blank>
139
+ - <unk>
140
+ - <space>
141
+ - E
142
+ - T
143
+ - A
144
+ - O
145
+ - I
146
+ - N
147
+ - H
148
+ - S
149
+ - R
150
+ - L
151
+ - D
152
+ - U
153
+ - W
154
+ - M
155
+ - C
156
+ - G
157
+ - Y
158
+ - B
159
+ - P
160
+ - F
161
+ - K
162
+ - ''''
163
+ - V
164
+ - X
165
+ - J
166
+ - Z
167
+ - Q
168
+ - ','
169
+ - '-'
170
+ - <sos/eos>
171
+ init: null
172
+ input_size: null
173
+ ctc_conf:
174
+ dropout_rate: 0.0
175
+ ctc_type: builtin
176
+ reduce: true
177
+ ignore_nan_grad: null
178
+ zero_infinity: true
179
+ brctc_risk_strategy: exp
180
+ brctc_group_strategy: end
181
+ brctc_risk_factor: 0.0
182
+ joint_net_conf: null
183
+ use_preprocessor: true
184
+ use_lang_prompt: false
185
+ use_nlp_prompt: false
186
+ token_type: char
187
+ bpemodel: null
188
+ non_linguistic_symbols: null
189
+ cleaner: null
190
+ g2p: null
191
+ speech_volume_normalize: null
192
+ rir_scp: null
193
+ rir_apply_prob: 1.0
194
+ noise_scp: null
195
+ noise_apply_prob: 1.0
196
+ noise_db_range: '13_15'
197
+ short_noise_thres: 0.5
198
+ aux_ctc_tasks: []
199
+ frontend: default
200
+ frontend_conf:
201
+ n_fft: 512
202
+ win_length: 400
203
+ hop_length: 160
204
+ fs: 16k
205
+ specaug: specaug
206
+ specaug_conf:
207
+ apply_time_warp: true
208
+ time_warp_window: 5
209
+ time_warp_mode: bicubic
210
+ apply_freq_mask: true
211
+ freq_mask_width_range:
212
+ - 0
213
+ - 27
214
+ num_freq_mask: 2
215
+ apply_time_mask: true
216
+ time_mask_width_ratio_range:
217
+ - 0.0
218
+ - 0.05
219
+ num_time_mask: 5
220
+ normalize: utterance_mvn
221
+ normalize_conf: {}
222
+ model: espnet
223
+ model_conf:
224
+ ctc_weight: 0.3
225
+ lsm_weight: 0.1
226
+ length_normalized_loss: false
227
+ preencoder: null
228
+ preencoder_conf: {}
229
+ encoder: e_branchformer
230
+ encoder_conf:
231
+ output_size: 256
232
+ attention_heads: 4
233
+ attention_layer_type: rel_selfattn
234
+ pos_enc_layer_type: rel_pos
235
+ rel_pos_type: latest
236
+ cgmlp_linear_units: 1024
237
+ cgmlp_conv_kernel: 31
238
+ use_linear_after_conv: false
239
+ gate_activation: identity
240
+ num_blocks: 12
241
+ dropout_rate: 0.1
242
+ positional_dropout_rate: 0.1
243
+ attention_dropout_rate: 0.1
244
+ input_layer: conv2d
245
+ layer_drop_rate: 0.0
246
+ linear_units: 1024
247
+ positionwise_layer_type: linear
248
+ use_ffn: true
249
+ macaron_ffn: true
250
+ merge_conv_kernel: 31
251
+ postencoder: null
252
+ postencoder_conf: {}
253
+ decoder: transformer
254
+ decoder_conf:
255
+ attention_heads: 4
256
+ linear_units: 2048
257
+ num_blocks: 6
258
+ dropout_rate: 0.1
259
+ positional_dropout_rate: 0.1
260
+ self_attention_dropout_rate: 0.1
261
+ src_attention_dropout_rate: 0.1
262
+ layer_drop_rate: 0.0
263
+ preprocessor: default
264
+ preprocessor_conf: {}
265
+ masker: null
266
+ masker_conf: {}
267
+ required:
268
+ - output_dir
269
+ - token_list
270
+ version: '202412'
271
+ distributed: false
exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/acc.png ADDED
exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/backward_time.png ADDED
exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/cer.png ADDED
exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/cer_ctc.png ADDED
exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/clip.png ADDED
exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/forward_time.png ADDED
exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/gpu_max_alloc_mem_GB.png ADDED
exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/gpu_max_cached_mem_GB.png ADDED
exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/grad_norm.png ADDED
exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/iter_time.png ADDED
exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/loss.png ADDED
exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/loss_att.png ADDED
exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/loss_ctc.png ADDED
exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/loss_scale.png ADDED
exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/optim0_lr0.png ADDED
exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/optim_step_time.png ADDED
exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/train_time.png ADDED
exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/images/wer.png ADDED
exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/valid.cer.ave_10best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6bc682a914dd4c4f4b92a6f5f236d5ef67da32af8a0f8f530f983ab1c6fa73a2
3
+ size 138927150
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202412'
2
+ files:
3
+ asr_model_file: exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/valid.cer.ave_10best.pth
4
+ python: 3.12.3 | packaged by Anaconda, Inc. | (main, May 6 2024, 19:46:43) [GCC 11.2.0]
5
+ timestamp: 1756005458.084908
6
+ torch: 2.4.0
7
+ yaml_files:
8
+ asr_train_config: exp/asr_train_asr_lr002_raw_en_char_dur05_filter_ba_h2l4/config.yaml