matejhornik commited on May 17

Commit

7daa40c

verified ·

1 Parent(s): c85954a

file upload

Browse files

Files changed (39) hide show

.gitattributes +1 -0
README.md +296 -3
all_results.json +20 -0
checkpoint-29000/config.json +303 -0
checkpoint-29000/generation_config.json +13 -0
checkpoint-29000/model.safetensors +3 -0
checkpoint-29000/optimizer.pt +3 -0
checkpoint-29000/preprocessor_config.json +9 -0
checkpoint-29000/rng_state.pth +3 -0
checkpoint-29000/scheduler.pt +3 -0
checkpoint-29000/trainer_state.json +0 -0
checkpoint-29000/training_args.bin +3 -0
config.json +303 -0
create_model.py +49 -0
eval_dev_results.json +9 -0
eval_test_results.json +9 -0
generation_config.json +13 -0
merges.txt +0 -0
model.safetensors +3 -0
preprocessor_config.json +9 -0
special_tokens_map.json +51 -0
tokenizer.json +0 -0
tokenizer_config.json +57 -0
train_results.json +9 -0
trainer_state.json +0 -0
training_args.bin +3 -0
vocab.json +0 -0
wandb/.DS_Store +0 -0
wandb/run-20250515_192303-7xkscxrj/files/config.yaml +1039 -0
wandb/run-20250515_192303-7xkscxrj/files/media/table/model_speed2size1_table_3555_34483c9cf24b143db620.table.json +1 -0
wandb/run-20250515_192303-7xkscxrj/files/media/table/model_speed2size2_table_3556_ffc3f22eaf8a279337f3.table.json +1 -0
wandb/run-20250515_192303-7xkscxrj/files/output.log +0 -0
wandb/run-20250515_192303-7xkscxrj/files/requirements.txt +184 -0
wandb/run-20250515_192303-7xkscxrj/files/wandb-metadata.json +96 -0
wandb/run-20250515_192303-7xkscxrj/files/wandb-summary.json +1 -0
wandb/run-20250515_192303-7xkscxrj/logs/debug-core.log +15 -0
wandb/run-20250515_192303-7xkscxrj/logs/debug-internal.log +17 -0
wandb/run-20250515_192303-7xkscxrj/logs/debug.log +35 -0
wandb/run-20250515_192303-7xkscxrj/run-7xkscxrj.wandb +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+wandb/run-20250515_192303-7xkscxrj/run-7xkscxrj.wandb filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,296 @@
----
-license: mit
----

+---
+language: en
+model_name: Wav2Vec2-BART (Base) English ASR - VoxPopuli Best WER
+license: mit
+tags:
+- automatic-speech-recognition
+- speech-encoder-decoder
+- wav2vec2
+- bart
+- english
+- voxpopuli
+- generated_from_trainer
+- audio
+- master-thesis
+datasets:
+- facebook/voxpopuli
+base_model:
+- facebook/wav2vec2-base-en-voxpopuli-v2
+- facebook/bart-base
+model-index:
+- name: matejhornik/wav2vec2-base_bart-base_voxpopuli-en
+  results:
+  - task:
+      type: automatic-speech-recognition
+      name: Automatic Speech Recognition
+    dataset:
+      name: VoxPopuli (English, Test)
+      type: facebook/voxpopuli
+      config: en
+      split: test
+    metrics:
+    - name: WER
+      type: wer
+      value: 0.08848048503220916 # 8.85%
+  - task:
+      type: automatic-speech-recognition
+      name: Automatic Speech Recognition
+    dataset:
+      name: VoxPopuli (English, Validation)
+      type: facebook/voxpopuli
+      config: en
+      split: validation
+    metrics:
+    - name: WER
+      type: wer
+      value: 0.08554638942253362 # 8.55%
+pipeline_tag: automatic-speech-recognition
+library_name: transformers
+---
+# Wav2Vec2-BART (Base) for English ASR on VoxPopuli - Best WER from Master's Thesis
+This repository contains the checkpoint for a `SpeechEncoderDecoderModel` fine-tuned for Automatic Speech Recognition (ASR) on the English portion of the VoxPopuli dataset. This model achieved the **best Word Error Rate (WER) of 8.85% on the VoxPopuli English test set** within the experimental framework of the Master's thesis "Effective Training of Neural Networks for Automatic Speech Recognition" by Matej Horník.
+The model leverages a pre-trained **Wav2Vec2 (Base)** encoder (`facebook/wav2vec2-base-en-voxpopuli-v2`) and a pre-trained **BART (Base)** decoder (`facebook/bart-base`), connected via convolutional adapter layers.
+## Thesis Context
+This model is a direct result of work conducted for the Master's thesis:
+*   **Title:** Effective Training of Neural Networks for Automatic Speech Recognition
+*   **Author:** Matej Horník
+*   **Supervisor:** Ing. Alexander Polok
+*   **Institution:** Brno University of Technology, Faculty of Information Technology
+*   **Year:** 2025
+*   **Thesis Link:** [Link to thesis PDF](https://www.vut.cz/en/students/final-thesis/detail/164401)
+> [!NOTE]
+> Link will be available after the thesis defense.
+### Thesis Abstract (English)
+This master's thesis focuses on improving the training efficiency and performance of encoder-decoder transformer models for Automatic Speech Recognition (ASR). It investigates the impact of initialization strategies using pre-trained components (Wav2Vec2, BART), the role of convolutional adapters, and Parameter-Efficient Fine-tuning (PEFT) methods like LoRA and DoRA. Experiments on LibriSpeech and VoxPopuli datasets confirmed that full pre-trained initialization is crucial for best Word Error Rate (WER) and convergence. An optimal number of adapters improved performance, while PEFT (especially LoRA) significantly reduced trainable parameters with comparable accuracy. Domain-specific encoder pre-training proved beneficial, and the encoder-decoder model outperformed a CTC baseline in accuracy, offering practical insights for efficient ASR training.
+## Model Details
+*   **Encoder:** `facebook/wav2vec2-base-en-voxpopuli-v2`. This is a Wav2Vec2 (Base) model pre-trained by Facebook on 24.1k hours of unlabeled English VoxPopuli data.
+*   **Decoder:** `facebook/bart-base`. This is a standard BART (Base) model.
+*   **Architecture:** `SpeechEncoderDecoderModel` from Hugging Face Transformers.
+*   **Adapters:** 3 convolutional adapter layers were added to the encoder's output to better align its temporal resolution with the BART decoder's input requirements.
+*   **Feature Extractor:** The Wav2Vec2 feature extractor (initial CNN layers) was kept frozen during fine-tuning, as experiments showed this maintained performance while reducing trainable parameters.
+### Initial Model Construction
+The base model (before fine-tuning for this specific result) was constructed by combining the pre-trained `facebook/wav2vec2-base-en-voxpopuli-v2` (encoder) and `facebook/bart-base` (decoder) using `SpeechEncoderDecoderModel.from_encoder_decoder_pretrained`. To create the model, code is provided in [create_model.py](create_model.py).
+```bash
+python create_model.py
+```
+## Training Data
+### Data
+The model was fine-tuned on the `train` split of the English portion of the [VoxPopuli dataset](https://huggingface.co/datasets/facebook/voxpopuli) (`facebook/voxpopuli`, config `en`).
+Audio data was resampled to 16kHz. Text transcriptions were tokenized using the BART tokenizer and lowercased.
+### Procedure
+The model was fine-tuned using modified [`run_speech_recognition_seq2seq.py`](https://github.com/hornikmatej/thesis_mit/blob/main/run_speech_recognition_seq2seq.py) script (provided in the thesis materials, based on Hugging Face's example scripts).
+**Key Hyperparameters:**
+* **Optimizer:** AdamW
+* **Learning Rate:** `1e-4`
+* **LR Scheduler:** `cosine_with_min_lr` (min\_lr: `5e-9`)
+* **Warmup Steps:** 2000
+* **Batch Size (per device):** 96
+* **Gradient Accumulation Steps:** 1
+* **Number of Epochs:** 20
+* **Weight Decay:** 0.01
+* **Label Smoothing Factor:** 0.05
+* **Mixed Precision:** bf16
+* **SpecAugment:** Applied during training
+    * `mask_time_prob`: 0.25, `mask_time_length`: 30, `mask_time_min_masks`: 2
+    * `mask_feature_prob`: 0.3, `mask_feature_length`: 30, `mask_feature_min_masks`: 1
+* **Feature Extractor:** Frozen
+The full training command can be found in the [thesis materials](https://github.com/hornikmatej/thesis_mit/blob/main/run_scripts/voxpopuli_best.sh), including the specific arguments used.
+## Evaluation
+The model achieves the following Word Error Rate (WER) on the VoxPopuli English dataset:
+| Dataset Split | WER (%) | Loss  |
+|---------------|---------|-------|
+| Validation    | 8.55%   | 1.056 |
+| Test          | 8.85%   | 1.076 |
+For detailed training logs, metrics, and visualizations, please refer to the Weights & Biases report:
+[![alt text](https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg)](https://api.wandb.ai/links/xhorni20-fitvut/2018dikj)
+## How to Use
+You can use this model for inference with the Hugging Face `transformers` library. Make sure you have `torchaudio` and `librosa` (or `soundfile`) installed for audio processing.
+```python
+from transformers import SpeechEncoderDecoderModel, AutoProcessor
+import torch
+import soundfile as sf
+model_id = "matejhornik/wav2vec2-base_bart-base_voxpopuli-en"
+device = "cuda" if torch.cuda.is_available() else "cpu"
+# Load the processor (feature extractor and tokenizer)
+processor = AutoProcessor.from_pretrained(model_id)
+# Load the model
+model = SpeechEncoderDecoderModel.from_pretrained(model_id).to(device)
+def transcribe_audio(audio_path):
+    """Loads audio, processes it, and transcribes it."""
+    speech_array, sampling_rate = sf.read(audio_path)
+    # Ensure audio is 16kHz as expected by the model
+    if sampling_rate != processor.feature_extractor.sampling_rate:
+        raise ValueError(f"Audio sampling rate {sampling_rate} does not match model's required {processor.feature_extractor.sampling_rate}Hz. Please resample.")
+    # Preprocess the audio
+    inputs = processor(speech_array, sampling_rate=processor.feature_extractor.sampling_rate, return_tensors="pt", padding=True)
+    input_features = inputs.input_features.to(device)
+    attention_mask = inputs.attention_mask.to(device)
+    # Generate transcription
+    with torch.no_grad():
+        predicted_ids = model.generate(input_features, attention_mask=attention_mask, max_length=128)
+    # Decode the transcription
+    transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
+    return transcription[0]
+# Example usage:
+audio_file_path = "path/to/your/audio.wav"
+try:
+   transcription = transcribe_audio(audio_file_path)
+   print(f"Transcription: {transcription}")
+except ValueError as e:
+   print(e)
+except FileNotFoundError:
+   print(f"Audio file not found at: {audio_file_path}. Please provide a valid path.")
+```
+## Reproducing Evaluation on VoxPopuli
+To reproduce the evaluation results on the VoxPopuli test set:
+```python
+from datasets import load_dataset
+from transformers import SpeechEncoderDecoderModel, AutoProcessor
+import torch
+from jiwer import wer
+from tqdm import tqdm
+model_id = "matejhornik/wav2vec2-base_bart-base_voxpopuli-en"
+dataset_name = "facebook/voxpopuli"
+dataset_config = "en"
+split = "test" # or "validation"
+device = "cuda" if torch.cuda.is_available() else "cpu"
+# Load processor and model
+processor = AutoProcessor.from_pretrained(model_id)
+model = SpeechEncoderDecoderModel.from_pretrained(model_id).to(device)
+model.eval() # Set model to evaluation mode
+# Load dataset
+# Note: You might need to authenticate with Hugging Face if the dataset requires it
+# from huggingface_hub import login
+voxpopuli_test = load_dataset(dataset_name, dataset_config, split=split, streaming=False) # Set streaming=True for large datasets if memory is an issue
+# Preprocessing function
+def map_to_pred(batch):
+    # Ensure audio is in the correct format (array, 16kHz)
+    audio_data = batch["audio"]["array"]
+    sampling_rate = batch["audio"]["sampling_rate"]
+    if sampling_rate != processor.feature_extractor.sampling_rate:
+        print(f"Warning: Resampling needed or sample skipped for audio with rate {sampling_rate}")
+        # Dummy processing for now if rate mismatch
+        input_features = torch.zeros((1,1000)) # Placeholder
+    else:
+        inputs = processor(audio_data, sampling_rate=sampling_rate, return_tensors="pt", padding=True)
+        input_features = inputs.input_features.to(device)
+    with torch.no_grad():
+        predicted_ids = model.generate(input_features, max_length=128)
+    transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
+    batch["prediction"] = transcription[0]
+    batch["reference"] = batch["normalized_text"]
+    return batch
+predictions = []
+references = []
+for sample in tqdm(voxpopuli_test):
+    try:
+        processed_sample = map_to_pred(sample)
+        predictions.append(processed_sample["prediction"])
+        references.append(processed_sample["reference"])
+    except Exception as e:
+        print(f"Error processing sample: {e}")
+# Calculate WER
+if predictions and references:
+    current_wer = wer(references, predictions)
+    print(f"WER on {split} set: {current_wer:.4f}")
+else:
+    print("No samples processed or an error occurred.")
+# Expected WER on test set: 0.0885
+# Expected WER on validation set: 0.0855
+```
+### Framework Versions
+This model was trained using:
+- Python: `^3.10`
+- Transformers: `~4.46.3`
+- PyTorch: `~2.5.1`
+- Datasets: `^3.2.0`
+- PEFT: `^0.14.0`
+- Accelerate: `^1.4.0`
+- Evaluate: `^0.4.3`
+- WandB: `^0.19.7`
+## Citation
+Citation
+If you use this model or findings from the thesis, please cite:
+[![CITE](https://excel.fit.vutbr.cz/wp-content/images/2023/FIT_color_CMYK_EN.svg)](https://www.vut.cz/en/students/final-thesis/detail/164401)
+```bibtex
+@mastersthesis{Hornik2025EffectiveTraining,
+  author       = {Horník, Matej},
+  title        = {Effective Training of Neural Networks for Automatic Speech Recognition},
+  school       = {Brno University of Technology, Faculty of Information Technology},
+  year         = {2025},
+  supervisor   = {Polok, Alexander},
+  type         = {Master's Thesis},
+  note         = {Online. Available at: \url{https://www.vut.cz/en/students/final-thesis/detail/164401}}
+}
+```
+## Acknowledgements
+- My supervisor, Ing. Alexander Polok, for his valuable guidance and support.
+- The Hugging Face team for their comprehensive transformers, datasets, and evaluate libraries.
+- The creators of Wav2Vec2, BART, and the VoxPopuli dataset.
+## Contact
+For questions, feedback, or collaboration opportunities related to this thesis or any other stuff, feel free to reach out:
+- **Email:** [email protected] / [email protected]
+- **GitHub:** [hornikmatej](https://github.com/hornikmatej)

all_results.json ADDED Viewed

	@@ -0,0 +1,20 @@

+{
+    "epoch": 20.0,
+    "eval_dev_loss": 1.0564184188842773,
+    "eval_dev_runtime": 121.5437,
+    "eval_dev_samples_per_second": 13.09,
+    "eval_dev_steps_per_second": 0.14,
+    "eval_dev_wer": 0.08554638942253362,
+    "eval_samples": 1705,
+    "eval_test_loss": 1.0758554935455322,
+    "eval_test_runtime": 132.2526,
+    "eval_test_samples_per_second": 12.892,
+    "eval_test_steps_per_second": 0.136,
+    "eval_test_wer": 0.08848048503220916,
+    "total_flos": 0.0,
+    "train_loss": 1.6298207611684346,
+    "train_runtime": 35628.5116,
+    "train_samples": 167046,
+    "train_samples_per_second": 93.771,
+    "train_steps_per_second": 0.977
+}

checkpoint-29000/config.json ADDED Viewed

	@@ -0,0 +1,303 @@

+{
+  "_name_or_path": "./seq2seq_wav2vec2_bart-base_24k-en-voxpopuli",
+  "architectures": [
+    "SpeechEncoderDecoderModel"
+  ],
+  "decoder": {
+    "_attn_implementation_autoset": true,
+    "_name_or_path": "facebook/bart-base",
+    "activation_dropout": 0.1,
+    "activation_function": "gelu",
+    "add_bias_logits": false,
+    "add_cross_attention": true,
+    "add_final_layer_norm": false,
+    "architectures": [
+      "BartModel"
+    ],
+    "attention_dropout": 0.1,
+    "bad_words_ids": null,
+    "begin_suppress_tokens": null,
+    "bos_token_id": 0,
+    "chunk_size_feed_forward": 0,
+    "classif_dropout": 0.1,
+    "classifier_dropout": 0.0,
+    "cross_attention_hidden_size": null,
+    "d_model": 768,
+    "decoder_attention_heads": 12,
+    "decoder_ffn_dim": 3072,
+    "decoder_layerdrop": 0.0,
+    "decoder_layers": 6,
+    "decoder_start_token_id": 2,
+    "diversity_penalty": 0.0,
+    "do_sample": false,
+    "dropout": 0.1,
+    "early_stopping": true,
+    "encoder_attention_heads": 12,
+    "encoder_ffn_dim": 3072,
+    "encoder_layerdrop": 0.0,
+    "encoder_layers": 6,
+    "encoder_no_repeat_ngram_size": 0,
+    "eos_token_id": 2,
+    "exponential_decay_length_penalty": null,
+    "finetuning_task": null,
+    "forced_bos_token_id": 0,
+    "forced_eos_token_id": 2,
+    "gradient_checkpointing": false,
+    "id2label": {
+      "0": "LABEL_0",
+      "1": "LABEL_1",
+      "2": "LABEL_2"
+    },
+    "init_std": 0.02,
+    "is_decoder": true,
+    "is_encoder_decoder": false,
+    "label2id": {
+      "LABEL_0": 0,
+      "LABEL_1": 1,
+      "LABEL_2": 2
+    },
+    "length_penalty": 1.0,
+    "max_length": 20,
+    "max_position_embeddings": 1024,
+    "min_length": 0,
+    "model_type": "bart",
+    "no_repeat_ngram_size": 3,
+    "normalize_before": false,
+    "normalize_embedding": true,
+    "num_beam_groups": 1,
+    "num_beams": 4,
+    "num_hidden_layers": 6,
+    "num_return_sequences": 1,
+    "output_attentions": false,
+    "output_hidden_states": false,
+    "output_scores": false,
+    "pad_token_id": 1,
+    "prefix": null,
+    "problem_type": null,
+    "pruned_heads": {},
+    "remove_invalid_values": false,
+    "repetition_penalty": 1.0,
+    "return_dict": true,
+    "return_dict_in_generate": false,
+    "scale_embedding": false,
+    "sep_token_id": null,
+    "suppress_tokens": null,
+    "task_specific_params": {
+      "summarization": {
+        "length_penalty": 1.0,
+        "max_length": 128,
+        "min_length": 12,
+        "num_beams": 4
+      },
+      "summarization_cnn": {
+        "length_penalty": 2.0,
+        "max_length": 142,
+        "min_length": 56,
+        "num_beams": 4
+      },
+      "summarization_xsum": {
+        "length_penalty": 1.0,
+        "max_length": 62,
+        "min_length": 11,
+        "num_beams": 6
+      }
+    },
+    "temperature": 1.0,
+    "tf_legacy_loss": false,
+    "tie_encoder_decoder": false,
+    "tie_word_embeddings": true,
+    "tokenizer_class": null,
+    "top_k": 50,
+    "top_p": 1.0,
+    "torch_dtype": "float32",
+    "torchscript": false,
+    "typical_p": 1.0,
+    "use_bfloat16": false,
+    "use_cache": true,
+    "vocab_size": 50265
+  },
+  "decoder_start_token_id": 0,
+  "encoder": {
+    "_attn_implementation_autoset": true,
+    "_name_or_path": "facebook/wav2vec2-base-en-voxpopuli-v2",
+    "activation_dropout": 0.0,
+    "adapter_attn_dim": null,
+    "adapter_kernel_size": 3,
+    "adapter_stride": 2,
+    "add_adapter": true,
+    "add_cross_attention": false,
+    "apply_spec_augment": true,
+    "architectures": [
+      "Wav2Vec2ForPreTraining"
+    ],
+    "attention_dropout": 0.1,
+    "bad_words_ids": null,
+    "begin_suppress_tokens": null,
+    "bos_token_id": 1,
+    "chunk_size_feed_forward": 0,
+    "classifier_proj_size": 256,
+    "codevector_dim": 256,
+    "contrastive_logits_temperature": 0.1,
+    "conv_bias": false,
+    "conv_dim": [
+      512,
+      512,
+      512,
+      512,
+      512,
+      512,
+      512
+    ],
+    "conv_kernel": [
+      10,
+      3,
+      3,
+      3,
+      3,
+      2,
+      2
+    ],
+    "conv_stride": [
+      5,
+      2,
+      2,
+      2,
+      2,
+      2,
+      2
+    ],
+    "cross_attention_hidden_size": null,
+    "ctc_loss_reduction": "sum",
+    "ctc_zero_infinity": false,
+    "decoder_start_token_id": null,
+    "diversity_loss_weight": 0.1,
+    "diversity_penalty": 0.0,
+    "do_sample": false,
+    "do_stable_layer_norm": false,
+    "early_stopping": false,
+    "encoder_no_repeat_ngram_size": 0,
+    "eos_token_id": 2,
+    "exponential_decay_length_penalty": null,
+    "feat_extract_activation": "gelu",
+    "feat_extract_norm": "group",
+    "feat_proj_dropout": 0.0,
+    "feat_quantizer_dropout": 0.0,
+    "final_dropout": 0.0,
+    "finetuning_task": null,
+    "forced_bos_token_id": null,
+    "forced_eos_token_id": null,
+    "freeze_feat_extract_train": true,
+    "hidden_act": "gelu",
+    "hidden_dropout": 0.1,
+    "hidden_size": 768,
+    "id2label": {
+      "0": "LABEL_0",
+      "1": "LABEL_1"
+    },
+    "initializer_range": 0.02,
+    "intermediate_size": 3072,
+    "is_decoder": false,
+    "is_encoder_decoder": false,
+    "label2id": {
+      "LABEL_0": 0,
+      "LABEL_1": 1
+    },
+    "layer_norm_eps": 1e-05,
+    "layerdrop": 0.0,
+    "length_penalty": 1.0,
+    "mask_channel_length": 10,
+    "mask_channel_min_space": 1,
+    "mask_channel_other": 0.0,
+    "mask_channel_prob": 0.0,
+    "mask_channel_selection": "static",
+    "mask_feature_length": 30,
+    "mask_feature_min_masks": 1,
+    "mask_feature_prob": 0.3,
+    "mask_time_length": 30,
+    "mask_time_min_masks": 2,
+    "mask_time_min_space": 1,
+    "mask_time_other": 0.0,
+    "mask_time_prob": 0.25,
+    "mask_time_selection": "static",
+    "max_length": 20,
+    "min_length": 0,
+    "model_type": "wav2vec2",
+    "no_mask_channel_overlap": false,
+    "no_mask_time_overlap": false,
+    "no_repeat_ngram_size": 0,
+    "num_adapter_layers": 3,
+    "num_attention_heads": 12,
+    "num_beam_groups": 1,
+    "num_beams": 1,
+    "num_codevector_groups": 2,
+    "num_codevectors_per_group": 320,
+    "num_conv_pos_embedding_groups": 16,
+    "num_conv_pos_embeddings": 128,
+    "num_feat_extract_layers": 7,
+    "num_hidden_layers": 12,
+    "num_negatives": 100,
+    "num_return_sequences": 1,
+    "output_attentions": false,
+    "output_hidden_size": 768,
+    "output_hidden_states": false,
+    "output_scores": false,
+    "pad_token_id": 0,
+    "prefix": null,
+    "problem_type": null,
+    "proj_codevector_dim": 256,
+    "pruned_heads": {},
+    "remove_invalid_values": false,
+    "repetition_penalty": 1.0,
+    "return_dict": true,
+    "return_dict_in_generate": false,
+    "sep_token_id": null,
+    "suppress_tokens": null,
+    "task_specific_params": null,
+    "tdnn_dilation": [
+      1,
+      2,
+      3,
+      1,
+      1
+    ],
+    "tdnn_dim": [
+      512,
+      512,
+      512,
+      512,
+      1500
+    ],
+    "tdnn_kernel": [
+      5,
+      3,
+      3,
+      1,
+      1
+    ],
+    "temperature": 1.0,
+    "tf_legacy_loss": false,
+    "tie_encoder_decoder": false,
+    "tie_word_embeddings": true,
+    "tokenizer_class": null,
+    "top_k": 50,
+    "top_p": 1.0,
+    "torch_dtype": "float32",
+    "torchscript": false,
+    "typical_p": 1.0,
+    "use_bfloat16": false,
+    "use_weighted_layer_sum": false,
+    "vocab_size": 32,
+    "xvector_output_dim": 512
+  },
+  "eos_token_id": 2,
+  "forced_decoder_ids": null,
+  "is_encoder_decoder": true,
+  "max_length": null,
+  "model_type": "speech-encoder-decoder",
+  "pad_token_id": 1,
+  "processor_class": "Wav2Vec2Processor",
+  "tie_word_embeddings": false,
+  "torch_dtype": "float32",
+  "transformers_version": "4.46.3",
+  "use_cache": false
+}

checkpoint-29000/generation_config.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+  "bos_token_id": 0,
+  "decoder_start_token_id": 2,
+  "early_stopping": true,
+  "eos_token_id": 2,
+  "forced_bos_token_id": 0,
+  "forced_eos_token_id": 2,
+  "max_length": 128,
+  "no_repeat_ngram_size": 3,
+  "num_beams": 4,
+  "pad_token_id": 1,
+  "transformers_version": "4.46.3"
+}

checkpoint-29000/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:70aba1c61e1c77e1aca95ba945b9189c8a5d45eded939dd13ad43cd79379c1ae
+size 804433536

checkpoint-29000/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2eef776121c682b2eb63f445c736f439a4fb4b00104d44a1b42327ed9239ae89
+size 1575479010

checkpoint-29000/preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "do_normalize": true,
+  "feature_extractor_type": "Wav2Vec2FeatureExtractor",
+  "feature_size": 1,
+  "padding_side": "right",
+  "padding_value": 0,
+  "return_attention_mask": false,
+  "sampling_rate": 16000
+}

checkpoint-29000/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bdfd60e90490dbbb552aa355c28f0d83eeb757980adc4347084c8c47548db074
+size 14244

checkpoint-29000/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:85207000dd7d5fd86930aaf7fe5ae36f2540075686490904e0ea67612f44c838
+size 1064

checkpoint-29000/trainer_state.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-29000/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1f161da6617cf4269393afbc8cb8565cefc83bb8de7610adde49de693563d3f3
+size 5624

config.json ADDED Viewed

	@@ -0,0 +1,303 @@

+{
+  "_name_or_path": "./seq2seq_wav2vec2_bart-base_24k-en-voxpopuli",
+  "architectures": [
+    "SpeechEncoderDecoderModel"
+  ],
+  "decoder": {
+    "_attn_implementation_autoset": true,
+    "_name_or_path": "facebook/bart-base",
+    "activation_dropout": 0.1,
+    "activation_function": "gelu",
+    "add_bias_logits": false,
+    "add_cross_attention": true,
+    "add_final_layer_norm": false,
+    "architectures": [
+      "BartModel"
+    ],
+    "attention_dropout": 0.1,
+    "bad_words_ids": null,
+    "begin_suppress_tokens": null,
+    "bos_token_id": 0,
+    "chunk_size_feed_forward": 0,
+    "classif_dropout": 0.1,
+    "classifier_dropout": 0.0,
+    "cross_attention_hidden_size": null,
+    "d_model": 768,
+    "decoder_attention_heads": 12,
+    "decoder_ffn_dim": 3072,
+    "decoder_layerdrop": 0.0,
+    "decoder_layers": 6,
+    "decoder_start_token_id": 2,
+    "diversity_penalty": 0.0,
+    "do_sample": false,
+    "dropout": 0.1,
+    "early_stopping": true,
+    "encoder_attention_heads": 12,
+    "encoder_ffn_dim": 3072,
+    "encoder_layerdrop": 0.0,
+    "encoder_layers": 6,
+    "encoder_no_repeat_ngram_size": 0,
+    "eos_token_id": 2,
+    "exponential_decay_length_penalty": null,
+    "finetuning_task": null,
+    "forced_bos_token_id": 0,
+    "forced_eos_token_id": 2,
+    "gradient_checkpointing": false,
+    "id2label": {
+      "0": "LABEL_0",
+      "1": "LABEL_1",
+      "2": "LABEL_2"
+    },
+    "init_std": 0.02,
+    "is_decoder": true,
+    "is_encoder_decoder": false,
+    "label2id": {
+      "LABEL_0": 0,
+      "LABEL_1": 1,
+      "LABEL_2": 2
+    },
+    "length_penalty": 1.0,
+    "max_length": 20,
+    "max_position_embeddings": 1024,
+    "min_length": 0,
+    "model_type": "bart",
+    "no_repeat_ngram_size": 3,
+    "normalize_before": false,
+    "normalize_embedding": true,
+    "num_beam_groups": 1,
+    "num_beams": 4,
+    "num_hidden_layers": 6,
+    "num_return_sequences": 1,
+    "output_attentions": false,
+    "output_hidden_states": false,
+    "output_scores": false,
+    "pad_token_id": 1,
+    "prefix": null,
+    "problem_type": null,
+    "pruned_heads": {},
+    "remove_invalid_values": false,
+    "repetition_penalty": 1.0,
+    "return_dict": true,
+    "return_dict_in_generate": false,
+    "scale_embedding": false,
+    "sep_token_id": null,
+    "suppress_tokens": null,
+    "task_specific_params": {
+      "summarization": {
+        "length_penalty": 1.0,
+        "max_length": 128,
+        "min_length": 12,
+        "num_beams": 4
+      },
+      "summarization_cnn": {
+        "length_penalty": 2.0,
+        "max_length": 142,
+        "min_length": 56,
+        "num_beams": 4
+      },
+      "summarization_xsum": {
+        "length_penalty": 1.0,
+        "max_length": 62,
+        "min_length": 11,
+        "num_beams": 6
+      }
+    },
+    "temperature": 1.0,
+    "tf_legacy_loss": false,
+    "tie_encoder_decoder": false,
+    "tie_word_embeddings": true,
+    "tokenizer_class": null,
+    "top_k": 50,
+    "top_p": 1.0,
+    "torch_dtype": "float32",
+    "torchscript": false,
+    "typical_p": 1.0,
+    "use_bfloat16": false,
+    "use_cache": true,
+    "vocab_size": 50265
+  },
+  "decoder_start_token_id": 0,
+  "encoder": {
+    "_attn_implementation_autoset": true,
+    "_name_or_path": "facebook/wav2vec2-base-en-voxpopuli-v2",
+    "activation_dropout": 0.0,
+    "adapter_attn_dim": null,
+    "adapter_kernel_size": 3,
+    "adapter_stride": 2,
+    "add_adapter": true,
+    "add_cross_attention": false,
+    "apply_spec_augment": true,
+    "architectures": [
+      "Wav2Vec2ForPreTraining"
+    ],
+    "attention_dropout": 0.1,
+    "bad_words_ids": null,
+    "begin_suppress_tokens": null,
+    "bos_token_id": 1,
+    "chunk_size_feed_forward": 0,
+    "classifier_proj_size": 256,
+    "codevector_dim": 256,
+    "contrastive_logits_temperature": 0.1,
+    "conv_bias": false,
+    "conv_dim": [
+      512,
+      512,
+      512,
+      512,
+      512,
+      512,
+      512
+    ],
+    "conv_kernel": [
+      10,
+      3,
+      3,
+      3,
+      3,
+      2,
+      2
+    ],
+    "conv_stride": [
+      5,
+      2,
+      2,
+      2,
+      2,
+      2,
+      2
+    ],
+    "cross_attention_hidden_size": null,
+    "ctc_loss_reduction": "sum",
+    "ctc_zero_infinity": false,
+    "decoder_start_token_id": null,
+    "diversity_loss_weight": 0.1,
+    "diversity_penalty": 0.0,
+    "do_sample": false,
+    "do_stable_layer_norm": false,
+    "early_stopping": false,
+    "encoder_no_repeat_ngram_size": 0,
+    "eos_token_id": 2,
+    "exponential_decay_length_penalty": null,
+    "feat_extract_activation": "gelu",
+    "feat_extract_norm": "group",
+    "feat_proj_dropout": 0.0,
+    "feat_quantizer_dropout": 0.0,
+    "final_dropout": 0.0,
+    "finetuning_task": null,
+    "forced_bos_token_id": null,
+    "forced_eos_token_id": null,
+    "freeze_feat_extract_train": true,
+    "hidden_act": "gelu",
+    "hidden_dropout": 0.1,
+    "hidden_size": 768,
+    "id2label": {
+      "0": "LABEL_0",
+      "1": "LABEL_1"
+    },
+    "initializer_range": 0.02,
+    "intermediate_size": 3072,
+    "is_decoder": false,
+    "is_encoder_decoder": false,
+    "label2id": {
+      "LABEL_0": 0,
+      "LABEL_1": 1
+    },
+    "layer_norm_eps": 1e-05,
+    "layerdrop": 0.0,
+    "length_penalty": 1.0,
+    "mask_channel_length": 10,
+    "mask_channel_min_space": 1,
+    "mask_channel_other": 0.0,
+    "mask_channel_prob": 0.0,
+    "mask_channel_selection": "static",
+    "mask_feature_length": 30,
+    "mask_feature_min_masks": 1,
+    "mask_feature_prob": 0.3,
+    "mask_time_length": 30,
+    "mask_time_min_masks": 2,
+    "mask_time_min_space": 1,
+    "mask_time_other": 0.0,
+    "mask_time_prob": 0.25,
+    "mask_time_selection": "static",
+    "max_length": 20,
+    "min_length": 0,
+    "model_type": "wav2vec2",
+    "no_mask_channel_overlap": false,
+    "no_mask_time_overlap": false,
+    "no_repeat_ngram_size": 0,
+    "num_adapter_layers": 3,
+    "num_attention_heads": 12,
+    "num_beam_groups": 1,
+    "num_beams": 1,
+    "num_codevector_groups": 2,
+    "num_codevectors_per_group": 320,
+    "num_conv_pos_embedding_groups": 16,
+    "num_conv_pos_embeddings": 128,
+    "num_feat_extract_layers": 7,
+    "num_hidden_layers": 12,
+    "num_negatives": 100,
+    "num_return_sequences": 1,
+    "output_attentions": false,
+    "output_hidden_size": 768,
+    "output_hidden_states": false,
+    "output_scores": false,
+    "pad_token_id": 0,
+    "prefix": null,
+    "problem_type": null,
+    "proj_codevector_dim": 256,
+    "pruned_heads": {},
+    "remove_invalid_values": false,
+    "repetition_penalty": 1.0,
+    "return_dict": true,
+    "return_dict_in_generate": false,
+    "sep_token_id": null,
+    "suppress_tokens": null,
+    "task_specific_params": null,
+    "tdnn_dilation": [
+      1,
+      2,
+      3,
+      1,
+      1
+    ],
+    "tdnn_dim": [
+      512,
+      512,
+      512,
+      512,
+      1500
+    ],
+    "tdnn_kernel": [
+      5,
+      3,
+      3,
+      1,
+      1
+    ],
+    "temperature": 1.0,
+    "tf_legacy_loss": false,
+    "tie_encoder_decoder": false,
+    "tie_word_embeddings": true,
+    "tokenizer_class": null,
+    "top_k": 50,
+    "top_p": 1.0,
+    "torch_dtype": "float32",
+    "torchscript": false,
+    "typical_p": 1.0,
+    "use_bfloat16": false,
+    "use_weighted_layer_sum": false,
+    "vocab_size": 32,
+    "xvector_output_dim": 512
+  },
+  "eos_token_id": 2,
+  "forced_decoder_ids": null,
+  "is_encoder_decoder": true,
+  "max_length": null,
+  "model_type": "speech-encoder-decoder",
+  "pad_token_id": 1,
+  "processor_class": "Wav2Vec2Processor",
+  "tie_word_embeddings": false,
+  "torch_dtype": "float32",
+  "transformers_version": "4.46.3",
+  "use_cache": false
+}

create_model.py ADDED Viewed

	@@ -0,0 +1,49 @@

+from transformers import SpeechEncoderDecoderModel, AutoFeatureExtractor, AutoTokenizer
+# Encoder for speech feature extraction
+encoder_checkpoint = "facebook/wav2vec2-base-en-voxpopuli-v2"
+# Decoder for text generation + its tokenizer
+decoder_checkpoint = "facebook/bart-base"
+# Path where this initial combined model is saved
+# This path is then used as --model_name_or_path in the fine-tuning script
+# e.g., "./seq2seq_wav2vec2_bart-base_24k-en-voxpopuli"
+INITIAL_MODEL_SAVE_PATH = "path_to_save_initial_model"
+model = SpeechEncoderDecoderModel.from_encoder_decoder_pretrained(
+    encoder_checkpoint,
+    decoder_checkpoint,
+    encoder_add_adapter=True,  # Enables adapter mechanism
+    encoder_num_adapter_layers=3,  # Specifies 3 adapter layers
+)
+# Configure encoder properties (example from thesis experiments)
+model.config.encoder.feat_proj_dropout = 0.0
+# model.config.encoder.mask_time_prob = 0.0 # No SpecAugment at initialization
+# Configure decoder start token, pad token, eos token from the decoder's config
+model.config.decoder_start_token_id = model.decoder.config.bos_token_id
+model.config.pad_token_id = (
+    model.decoder.config.pad_token_id
+)  # Or tokenizer.pad_token_id
+model.config.eos_token_id = (
+    model.decoder.config.eos_token_id
+)  # Or tokenizer.eos_token_id
+# Configure generation parameters
+model.config.max_length = 128
+model.config.encoder.layerdrop = 0.0
+model.config.use_cache = False  # Important for training
+# Save the initialized model, feature extractor, and tokenizer
+model.save_pretrained(INITIAL_MODEL_SAVE_PATH)
+feature_extractor = AutoFeatureExtractor.from_pretrained(encoder_checkpoint)
+feature_extractor.save_pretrained(INITIAL_MODEL_SAVE_PATH)
+tokenizer = AutoTokenizer.from_pretrained(decoder_checkpoint)
+tokenizer.save_pretrained(INITIAL_MODEL_SAVE_PATH)
+print(
+    f"Initialized model, feature extractor, and tokenizer saved to {INITIAL_MODEL_SAVE_PATH}"
+)

eval_dev_results.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+    "epoch": 20.0,
+    "eval_dev_loss": 1.0564184188842773,
+    "eval_dev_runtime": 121.5437,
+    "eval_dev_samples_per_second": 13.09,
+    "eval_dev_steps_per_second": 0.14,
+    "eval_dev_wer": 0.08554638942253362,
+    "eval_samples": 1591
+}

eval_test_results.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+    "epoch": 20.0,
+    "eval_samples": 1705,
+    "eval_test_loss": 1.0758554935455322,
+    "eval_test_runtime": 132.2526,
+    "eval_test_samples_per_second": 12.892,
+    "eval_test_steps_per_second": 0.136,
+    "eval_test_wer": 0.08848048503220916
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+  "bos_token_id": 0,
+  "decoder_start_token_id": 2,
+  "early_stopping": true,
+  "eos_token_id": 2,
+  "forced_bos_token_id": 0,
+  "forced_eos_token_id": 2,
+  "max_length": 128,
+  "no_repeat_ngram_size": 3,
+  "num_beams": 4,
+  "pad_token_id": 1,
+  "transformers_version": "4.46.3"
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:70aba1c61e1c77e1aca95ba945b9189c8a5d45eded939dd13ad43cd79379c1ae
+size 804433536

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "do_normalize": true,
+  "feature_extractor_type": "Wav2Vec2FeatureExtractor",
+  "feature_size": 1,
+  "padding_side": "right",
+  "padding_value": 0,
+  "return_attention_mask": false,
+  "sampling_rate": 16000
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,51 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "cls_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,57 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50264": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "errors": "replace",
+  "mask_token": "<mask>",
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "tokenizer_class": "BartTokenizer",
+  "trim_offsets": true,
+  "unk_token": "<unk>"
+}

train_results.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+    "epoch": 20.0,
+    "total_flos": 0.0,
+    "train_loss": 1.6298207611684346,
+    "train_runtime": 35628.5116,
+    "train_samples": 167046,
+    "train_samples_per_second": 93.771,
+    "train_steps_per_second": 0.977
+}

trainer_state.json ADDED Viewed

The diff for this file is too large to render. See raw diff

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1f161da6617cf4269393afbc8cb8565cefc83bb8de7610adde49de693563d3f3
+size 5624

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

wandb/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

wandb/run-20250515_192303-7xkscxrj/files/config.yaml ADDED Viewed

	@@ -0,0 +1,1039 @@

+_attn_implementation_autoset:
+    value: true
+_name_or_path:
+    value: ./seq2seq_wav2vec2_bart-base_24k-en-voxpopuli
+_wandb:
+    value:
+        cli_version: 0.19.7
+        m:
+            - "1": train/global_step
+              "6":
+                - 3
+              "7": []
+            - "1": eval/wer
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": eval/dev_runtime
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": model_speed2size1_table.path
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": eval/test_steps_per_second
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": train/learning_rate
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": substitutions
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": eval/samples_per_second
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": eval/test_runtime
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": eval/dev_loss
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": eval/test_samples_per_second
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": model_speed2size2_table.sha256
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": model_speed2size2_table._latest_artifact_path
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": model_speed2size1_table._type
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": model_speed2size1_table.artifact_path
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": word_accuracy
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": eval/loss
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": eval/steps_per_second
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": eval/test_loss
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": model_speed2size1_table.nrows
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": model_speed2size1_table.sha256
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": model_speed2size1_table.size
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": model_speed2size2_table.path
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": train/epoch
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": insertions
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": word_errors
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": eval/runtime
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": model_speed2size2_table.ncols
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": model_speed2size2_table._type
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": model_speed2size2_table.size
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": model_speed2size2_table.artifact_path
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": test_sample_index
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": train/loss
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": sentence_errors
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": eval/dev_samples_per_second
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": model_speed2size1_table.ncols
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": train/grad_norm
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": deletions
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": eval/dev_wer
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": model_speed2size2_table.nrows
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": eval/dev_steps_per_second
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": eval/test_wer
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+            - "1": model_speed2size1_table._latest_artifact_path
+              "5": 1
+              "6":
+                - 1
+                - 3
+              "7": []
+        python_version: 3.11.11
+        t:
+            "1":
+                - 1
+                - 5
+                - 11
+                - 41
+                - 49
+                - 51
+                - 53
+                - 55
+                - 71
+                - 98
+                - 100
+            "2":
+                - 1
+                - 5
+                - 11
+                - 41
+                - 49
+                - 51
+                - 53
+                - 55
+                - 71
+                - 98
+                - 100
+            "3":
+                - 2
+                - 7
+                - 13
+                - 19
+                - 23
+                - 55
+                - 62
+                - 66
+            "4": 3.11.11
+            "5": 0.19.7
+            "6": 4.46.3
+            "8":
+                - 5
+            "9":
+                "1": transformers_trainer
+            "12": 0.19.7
+            "13": linux-x86_64
+        visualize:
+            model_speed2size1:
+                panel_config:
+                    fieldSettings:
+                        x: Time per step
+                        "y": Trainable parameters
+                    panelDefId: wandb/scatter/v0
+                    stringSettings:
+                        title: ""
+                    transform:
+                        name: tableWithLeafColNames
+                    userQuery:
+                        queryFields:
+                            - args:
+                                - name: runSets
+                                  value: ${runSets}
+                              fields:
+                                - fields: []
+                                  name: id
+                                - fields: []
+                                  name: name
+                                - fields: []
+                                  name: _defaultColorIndex
+                                - args:
+                                    - name: tableKey
+                                      value: model_speed2size1_table
+                                  fields: []
+                                  name: summaryTable
+                              name: runSets
+                panel_type: Vega2
+            model_speed2size2:
+                panel_config:
+                    fieldSettings:
+                        x: Time per step
+                        "y": Total parameters
+                    panelDefId: wandb/scatter/v0
+                    stringSettings:
+                        title: ""
+                    transform:
+                        name: tableWithLeafColNames
+                    userQuery:
+                        queryFields:
+                            - args:
+                                - name: runSets
+                                  value: ${runSets}
+                              fields:
+                                - fields: []
+                                  name: id
+                                - fields: []
+                                  name: name
+                                - fields: []
+                                  name: _defaultColorIndex
+                                - args:
+                                    - name: tableKey
+                                      value: model_speed2size2_table
+                                  fields: []
+                                  name: summaryTable
+                              name: runSets
+                panel_type: Vega2
+accelerator_config:
+    value:
+        dispatch_batches: null
+        even_batches: true
+        gradient_accumulation_kwargs: null
+        non_blocking: false
+        split_batches: false
+        use_seedable_sampler: true
+adafactor:
+    value: false
+adam_beta1:
+    value: 0.9
+adam_beta2:
+    value: 0.999
+adam_epsilon:
+    value: 1e-08
+add_cross_attention:
+    value: false
+architectures:
+    value:
+        - SpeechEncoderDecoderModel
+auto_find_batch_size:
+    value: false
+average_tokens_across_devices:
+    value: false
+bad_words_ids:
+    value: null
+batch_eval_metrics:
+    value: false
+begin_suppress_tokens:
+    value: null
+bf16:
+    value: true
+bf16_full_eval:
+    value: false
+bos_token_id:
+    value: null
+chunk_size_feed_forward:
+    value: 0
+cross_attention_hidden_size:
+    value: null
+data_seed:
+    value: null
+dataloader_drop_last:
+    value: false
+dataloader_num_workers:
+    value: 16
+dataloader_persistent_workers:
+    value: false
+dataloader_pin_memory:
+    value: true
+dataloader_prefetch_factor:
+    value: 2
+ddp_backend:
+    value: null
+ddp_broadcast_buffers:
+    value: null
+ddp_bucket_cap_mb:
+    value: null
+ddp_find_unused_parameters:
+    value: null
+ddp_timeout:
+    value: 1800
+debug:
+    value: []
+decoder:
+    value:
+        _attn_implementation_autoset: true
+        _name_or_path: facebook/bart-base
+        activation_dropout: 0.1
+        activation_function: gelu
+        add_bias_logits: false
+        add_cross_attention: true
+        add_final_layer_norm: false
+        architectures:
+            - BartModel
+        attention_dropout: 0.1
+        bad_words_ids: null
+        begin_suppress_tokens: null
+        bos_token_id: 0
+        chunk_size_feed_forward: 0
+        classif_dropout: 0.1
+        classifier_dropout: 0
+        cross_attention_hidden_size: null
+        d_model: 768
+        decoder_attention_heads: 12
+        decoder_ffn_dim: 3072
+        decoder_layerdrop: 0
+        decoder_layers: 6
+        decoder_start_token_id: 2
+        diversity_penalty: 0
+        do_sample: false
+        dropout: 0.1
+        early_stopping: true
+        encoder_attention_heads: 12
+        encoder_ffn_dim: 3072
+        encoder_layerdrop: 0
+        encoder_layers: 6
+        encoder_no_repeat_ngram_size: 0
+        eos_token_id: 2
+        exponential_decay_length_penalty: null
+        finetuning_task: null
+        forced_bos_token_id: 0
+        forced_eos_token_id: 2
+        gradient_checkpointing: false
+        id2label:
+            "0": LABEL_0
+            "1": LABEL_1
+            "2": LABEL_2
+        init_std: 0.02
+        is_decoder: true
+        is_encoder_decoder: false
+        label2id:
+            LABEL_0: 0
+            LABEL_1: 1
+            LABEL_2: 2
+        length_penalty: 1
+        max_length: 20
+        max_position_embeddings: 1024
+        min_length: 0
+        model_type: bart
+        no_repeat_ngram_size: 3
+        normalize_before: false
+        normalize_embedding: true
+        num_beam_groups: 1
+        num_beams: 4
+        num_hidden_layers: 6
+        num_return_sequences: 1
+        output_attentions: false
+        output_hidden_states: false
+        output_scores: false
+        pad_token_id: 1
+        prefix: null
+        problem_type: null
+        remove_invalid_values: false
+        repetition_penalty: 1
+        return_dict: true
+        return_dict_in_generate: false
+        scale_embedding: false
+        sep_token_id: null
+        suppress_tokens: null
+        task_specific_params:
+            summarization:
+                length_penalty: 1
+                max_length: 128
+                min_length: 12
+                num_beams: 4
+            summarization_cnn:
+                length_penalty: 2
+                max_length: 142
+                min_length: 56
+                num_beams: 4
+            summarization_xsum:
+                length_penalty: 1
+                max_length: 62
+                min_length: 11
+                num_beams: 6
+        temperature: 1
+        tf_legacy_loss: false
+        tie_encoder_decoder: false
+        tie_word_embeddings: true
+        tokenizer_class: null
+        top_k: 50
+        top_p: 1
+        torch_dtype: float32
+        torchscript: false
+        typical_p: 1
+        use_bfloat16: false
+        use_cache: true
+        vocab_size: 50265
+decoder_start_token_id:
+    value: 0
+deepspeed:
+    value: null
+disable_tqdm:
+    value: false
+dispatch_batches:
+    value: null
+diversity_penalty:
+    value: 0
+do_eval:
+    value: true
+do_predict:
+    value: true
+do_sample:
+    value: false
+do_train:
+    value: true
+early_stopping:
+    value: false
+encoder:
+    value:
+        _attn_implementation_autoset: true
+        _name_or_path: facebook/wav2vec2-base-en-voxpopuli-v2
+        activation_dropout: 0
+        adapter_attn_dim: null
+        adapter_kernel_size: 3
+        adapter_stride: 2
+        add_adapter: true
+        add_cross_attention: false
+        apply_spec_augment: true
+        architectures:
+            - Wav2Vec2ForPreTraining
+        attention_dropout: 0.1
+        bad_words_ids: null
+        begin_suppress_tokens: null
+        bos_token_id: 1
+        chunk_size_feed_forward: 0
+        classifier_proj_size: 256
+        codevector_dim: 256
+        contrastive_logits_temperature: 0.1
+        conv_bias: false
+        conv_dim:
+            - 512
+            - 512
+            - 512
+            - 512
+            - 512
+            - 512
+            - 512
+        conv_kernel:
+            - 10
+            - 3
+            - 3
+            - 3
+            - 3
+            - 2
+            - 2
+        conv_stride:
+            - 5
+            - 2
+            - 2
+            - 2
+            - 2
+            - 2
+            - 2
+        cross_attention_hidden_size: null
+        ctc_loss_reduction: sum
+        ctc_zero_infinity: false
+        decoder_start_token_id: null
+        diversity_loss_weight: 0.1
+        diversity_penalty: 0
+        do_sample: false
+        do_stable_layer_norm: false
+        early_stopping: false
+        encoder_no_repeat_ngram_size: 0
+        eos_token_id: 2
+        exponential_decay_length_penalty: null
+        feat_extract_activation: gelu
+        feat_extract_norm: group
+        feat_proj_dropout: 0
+        feat_quantizer_dropout: 0
+        final_dropout: 0
+        finetuning_task: null
+        forced_bos_token_id: null
+        forced_eos_token_id: null
+        freeze_feat_extract_train: true
+        hidden_act: gelu
+        hidden_dropout: 0.1
+        hidden_size: 768
+        id2label:
+            "0": LABEL_0
+            "1": LABEL_1
+        initializer_range: 0.02
+        intermediate_size: 3072
+        is_decoder: false
+        is_encoder_decoder: false
+        label2id:
+            LABEL_0: 0
+            LABEL_1: 1
+        layer_norm_eps: 1e-05
+        layerdrop: 0
+        length_penalty: 1
+        mask_channel_length: 10
+        mask_channel_min_space: 1
+        mask_channel_other: 0
+        mask_channel_prob: 0
+        mask_channel_selection: static
+        mask_feature_length: 30
+        mask_feature_min_masks: 1
+        mask_feature_prob: 0.3
+        mask_time_length: 30
+        mask_time_min_masks: 2
+        mask_time_min_space: 1
+        mask_time_other: 0
+        mask_time_prob: 0.25
+        mask_time_selection: static
+        max_length: 20
+        min_length: 0
+        model_type: wav2vec2
+        no_mask_channel_overlap: false
+        no_mask_time_overlap: false
+        no_repeat_ngram_size: 0
+        num_adapter_layers: 3
+        num_attention_heads: 12
+        num_beam_groups: 1
+        num_beams: 1
+        num_codevector_groups: 2
+        num_codevectors_per_group: 320
+        num_conv_pos_embedding_groups: 16
+        num_conv_pos_embeddings: 128
+        num_feat_extract_layers: 7
+        num_hidden_layers: 12
+        num_negatives: 100
+        num_return_sequences: 1
+        output_attentions: false
+        output_hidden_size: 768
+        output_hidden_states: false
+        output_scores: false
+        pad_token_id: 0
+        prefix: null
+        problem_type: null
+        proj_codevector_dim: 256
+        remove_invalid_values: false
+        repetition_penalty: 1
+        return_dict: true
+        return_dict_in_generate: false
+        sep_token_id: null
+        suppress_tokens: null
+        task_specific_params: null
+        tdnn_dilation:
+            - 1
+            - 2
+            - 3
+            - 1
+            - 1
+        tdnn_dim:
+            - 512
+            - 512
+            - 512
+            - 512
+            - 1500
+        tdnn_kernel:
+            - 5
+            - 3
+            - 3
+            - 1
+            - 1
+        temperature: 1
+        tf_legacy_loss: false
+        tie_encoder_decoder: false
+        tie_word_embeddings: true
+        tokenizer_class: null
+        top_k: 50
+        top_p: 1
+        torch_dtype: float32
+        torchscript: false
+        typical_p: 1
+        use_bfloat16: false
+        use_weighted_layer_sum: false
+        vocab_size: 32
+        xvector_output_dim: 512
+encoder_no_repeat_ngram_size:
+    value: 0
+eos_token_id:
+    value: 2
+eval_accumulation_steps:
+    value: null
+eval_delay:
+    value: 0
+eval_do_concat_batches:
+    value: true
+eval_on_start:
+    value: false
+eval_steps:
+    value: 1000
+eval_strategy:
+    value: steps
+eval_use_gather_object:
+    value: false
+evaluation_strategy:
+    value: null
+exponential_decay_length_penalty:
+    value: null
+finetuning_task:
+    value: null
+forced_bos_token_id:
+    value: null
+forced_decoder_ids:
+    value: null
+forced_eos_token_id:
+    value: null
+fp16:
+    value: false
+fp16_backend:
+    value: auto
+fp16_full_eval:
+    value: false
+fp16_opt_level:
+    value: O1
+fsdp:
+    value: []
+fsdp_config:
+    value:
+        min_num_params: 0
+        xla: false
+        xla_fsdp_grad_ckpt: false
+        xla_fsdp_v2: false
+fsdp_min_num_params:
+    value: 0
+fsdp_transformer_layer_cls_to_wrap:
+    value: null
+full_determinism:
+    value: false
+generation_config:
+    value: null
+generation_max_length:
+    value: null
+generation_num_beams:
+    value: null
+gradient_accumulation_steps:
+    value: 1
+gradient_checkpointing:
+    value: false
+gradient_checkpointing_kwargs:
+    value: null
+greater_is_better:
+    value: false
+group_by_length:
+    value: false
+half_precision_backend:
+    value: auto
+hub_always_push:
+    value: false
+hub_model_id:
+    value: null
+hub_private_repo:
+    value: false
+hub_strategy:
+    value: every_save
+hub_token:
+    value: <HUB_TOKEN>
+id2label:
+    value:
+        "0": LABEL_0
+        "1": LABEL_1
+ignore_data_skip:
+    value: false
+include_for_metrics:
+    value: []
+include_inputs_for_metrics:
+    value: false
+include_num_input_tokens_seen:
+    value: false
+include_tokens_per_second:
+    value: false
+is_decoder:
+    value: false
+is_encoder_decoder:
+    value: true
+jit_mode_eval:
+    value: false
+label_names:
+    value: null
+label_smoothing_factor:
+    value: 0.05
+label2id:
+    value:
+        LABEL_0: 0
+        LABEL_1: 1
+learning_rate:
+    value: 0.0001
+length_column_name:
+    value: input_length
+length_penalty:
+    value: 1
+load_best_model_at_end:
+    value: true
+local_rank:
+    value: 0
+log_level:
+    value: passive
+log_level_replica:
+    value: warning
+log_on_each_node:
+    value: true
+logging_dir:
+    value: ./seq2seq_wav2vec2_bart-base_24k-en-voxpopuli/t1_new1_spec/runs/May15_19-23-02_achjo
+logging_first_step:
+    value: false
+logging_nan_inf_filter:
+    value: true
+logging_steps:
+    value: 10
+logging_strategy:
+    value: steps
+lr_scheduler_kwargs:
+    value:
+        min_lr: 5e-09
+lr_scheduler_type:
+    value: cosine_with_min_lr
+max_grad_norm:
+    value: 1
+max_length:
+    value: null
+max_steps:
+    value: -1
+metric_for_best_model:
+    value: wer
+min_length:
+    value: 0
+model/num_parameters:
+    value: 201096832
+model_type:
+    value: speech-encoder-decoder
+mp_parameters:
+    value: ""
+neftune_noise_alpha:
+    value: null
+no_cuda:
+    value: false
+no_repeat_ngram_size:
+    value: 0
+num_beam_groups:
+    value: 1
+num_beams:
+    value: 1
+num_return_sequences:
+    value: 1
+num_train_epochs:
+    value: 20
+optim:
+    value: adamw_torch
+optim_args:
+    value: null
+optim_target_modules:
+    value: null
+output_attentions:
+    value: false
+output_dir:
+    value: ./seq2seq_wav2vec2_bart-base_24k-en-voxpopuli/t1_new1_spec
+output_hidden_states:
+    value: false
+output_scores:
+    value: false
+overwrite_output_dir:
+    value: true
+pad_token_id:
+    value: 1
+past_index:
+    value: -1
+per_device_eval_batch_size:
+    value: 96
+per_device_train_batch_size:
+    value: 96
+per_gpu_eval_batch_size:
+    value: null
+per_gpu_train_batch_size:
+    value: null
+predict_with_generate:
+    value: true
+prediction_loss_only:
+    value: false
+prefix:
+    value: null
+problem_type:
+    value: null
+processor_class:
+    value: Wav2Vec2Processor
+push_to_hub:
+    value: false
+push_to_hub_model_id:
+    value: null
+push_to_hub_organization:
+    value: null
+push_to_hub_token:
+    value: <PUSH_TO_HUB_TOKEN>
+ray_scope:
+    value: last
+remove_invalid_values:
+    value: false
+remove_unused_columns:
+    value: true
+repetition_penalty:
+    value: 1
+report_to:
+    value:
+        - wandb
+restore_callback_states_from_checkpoint:
+    value: false
+resume_from_checkpoint:
+    value: null
+return_dict:
+    value: true
+return_dict_in_generate:
+    value: false
+run_name:
+    value: facebook/voxpopuli_en_split-train_wav2vec2-bart_bs96_lr0.0001_ep20.0
+save_on_each_node:
+    value: false
+save_only_model:
+    value: false
+save_safetensors:
+    value: true
+save_steps:
+    value: 1000
+save_strategy:
+    value: steps
+save_total_limit:
+    value: 1
+seed:
+    value: 42
+sep_token_id:
+    value: null
+skip_memory_metrics:
+    value: true
+sortish_sampler:
+    value: false
+split_batches:
+    value: null
+suppress_tokens:
+    value: null
+task_specific_params:
+    value: null
+temperature:
+    value: 1
+tf_legacy_loss:
+    value: false
+tf32:
+    value: null
+tie_encoder_decoder:
+    value: false
+tie_word_embeddings:
+    value: false
+tokenizer_class:
+    value: null
+top_k:
+    value: 50
+top_p:
+    value: 1
+torch_compile:
+    value: false
+torch_compile_backend:
+    value: null
+torch_compile_mode:
+    value: null
+torch_dtype:
+    value: float32
+torch_empty_cache_steps:
+    value: null
+torchdynamo:
+    value: null
+torchscript:
+    value: false
+tpu_metrics_debug:
+    value: false
+tpu_num_cores:
+    value: null
+transformers_version:
+    value: 4.46.3
+typical_p:
+    value: 1
+use_bfloat16:
+    value: false
+use_cache:
+    value: false
+use_cpu:
+    value: false
+use_ipex:
+    value: false
+use_legacy_prediction_loop:
+    value: false
+use_liger_kernel:
+    value: false
+use_mps_device:
+    value: false
+warmup_ratio:
+    value: 0
+warmup_steps:
+    value: 2000
+weight_decay:
+    value: 0.01

wandb/run-20250515_192303-7xkscxrj/files/media/table/model_speed2size1_table_3555_34483c9cf24b143db620.table.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"columns": ["Time per step", "Trainable parameters"], "data": [[0.054152575027465746, 196896384]]}

wandb/run-20250515_192303-7xkscxrj/files/media/table/model_speed2size2_table_3556_ffc3f22eaf8a279337f3.table.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"columns": ["Time per step", "Total parameters"], "data": [[0.054152575027465746, 201096832]]}

wandb/run-20250515_192303-7xkscxrj/files/output.log ADDED Viewed

The diff for this file is too large to render. See raw diff

wandb/run-20250515_192303-7xkscxrj/files/requirements.txt ADDED Viewed

	@@ -0,0 +1,184 @@

+seaborn==0.13.2
+certifi==2024.12.14
+aiohappyeyeballs==2.4.4
+filelock==3.16.1
+executing==2.1.0
+nvidia-nvjitlink-cu12==12.4.127
+SecretStorage==3.3.3
+mkl_fft==1.3.11
+installer==0.7.0
+cycler==0.12.1
+keyring==25.6.0
+idna==3.10
+mpmath==1.3.0
+prompt_toolkit==3.0.48
+urllib3==2.3.0
+aiohttp==3.11.11
+jaraco.classes==3.4.0
+RapidFuzz==3.11.0
+triton==3.1.0
+click==8.1.8
+regex==2024.11.6
+joblib==1.4.2
+pyparsing==3.2.0
+attrs==24.3.0
+typing_extensions==4.12.2
+jedi==0.19.2
+pkginfo==1.12.1.2
+stack-data==0.6.3
+huggingface-hub==0.27.0
+multidict==6.1.0
+fastjsonschema==2.21.1
+cleo==2.1.0
+pydantic_core==2.27.2
+zipp==3.21.0
+python-dateutil==2.9.0.post0
+trove-classifiers==2025.5.1.12
+contourpy==1.3.1
+torchaudio==2.5.1
+annotated-types==0.7.0
+scikit-learn==1.6.0
+lazy_loader==0.4
+smmap==5.0.1
+jiwer==3.0.5
+requests==2.32.3
+gitdb==4.0.11
+numpy==2.0.2
+sentry-sdk==2.19.2
+gmpy2==2.2.1
+sniffio==1.3.1
+build==1.2.2.post1
+nvidia-nvtx-cu12==12.4.127
+nvidia-nccl-cu12==2.21.5
+traitlets==5.14.3
+nvidia-cuda-runtime-cu12==12.4.127
+pillow==11.0.0
+packaging==24.2
+jeepney==0.9.0
+pexpect==4.9.0
+accelerate==1.4.0
+httpx==0.28.1
+jaraco.context==6.0.1
+multiprocess==0.70.16
+torchvision==0.20.1
+virtualenv==20.31.0
+nvidia-cufft-cu12==11.2.1.3
+xxhash==3.5.0
+sympy==1.13.1
+tqdm==4.67.1
+pbs-installer==2025.4.9
+wheel==0.45.1
+pyzmq==26.2.0
+pyarrow==18.1.0
+importlib_metadata==8.7.0
+pure_eval==0.2.3
+tomlkit==0.13.2
+pandas==2.2.3
+safetensors==0.4.5
+crashtest==0.4.1
+propcache==0.2.1
+comm==0.2.2
+ipython==8.31.0
+protobuf==5.29.2
+mkl-service==2.4.0
+cffi==1.17.1
+PySocks==1.7.1
+networkx==3.4.2
+poetry==2.1.3
+debugpy==1.8.11
+GitPython==3.1.43
+pyproject_hooks==1.2.0
+ptyprocess==0.7.0
+requests-toolbelt==1.0.0
+setproctitle==1.3.4
+ipykernel==6.29.5
+pydantic-settings==2.7.0
+nvidia-cuda-cupti-cu12==12.4.127
+threadpoolctl==3.5.0
+jaraco.functools==4.1.0
+tokenizers==0.20.3
+python-dotenv==1.0.1
+numba==0.60.0
+dill==0.3.8
+msgpack==1.1.0
+tzdata==2024.2
+audioread==3.0.1
+pip==25.1
+nvidia-cusolver-cu12==11.6.1.9
+yarl==1.18.3
+pydantic==2.10.4
+shellingham==1.5.4
+librosa==0.10.2.post1
+Pygments==2.18.0
+docker-pycreds==0.4.0
+fsspec==2024.9.0
+anyio==4.9.0
+fonttools==4.55.3
+more-itertools==10.7.0
+tornado==6.4.2
+backports.tarfile==1.2.0
+transformers==4.46.3
+dulwich==0.22.8
+psutil==6.1.1
+nvidia-cuda-nvrtc-cu12==12.4.127
+six==1.17.0
+wcwidth==0.2.13
+asttokens==3.0.0
+platformdirs==4.3.6
+jupyter_client==8.6.3
+pytz==2024.2
+decorator==5.1.1
+nvidia-cublas-cu12==12.4.5.8
+matplotlib==3.10.0
+pooch==1.8.2
+aiosignal==1.3.2
+httpcore==1.0.9
+Brotli==1.0.9
+parso==0.8.4
+nvidia-cusparse-cu12==12.3.1.170
+Jinja2==3.1.5
+datasets==3.5.1
+poetry-core==2.1.3
+PyYAML==6.0.2
+MarkupSafe==3.0.2
+mkl_random==1.2.8
+evaluate==0.4.3
+matplotlib-inline==0.1.7
+frozenlist==1.5.0
+kiwisolver==1.4.7
+zstandard==0.23.0
+nvidia-curand-cu12==10.3.5.147
+soxr==0.5.0.post1
+CacheControl==0.14.3
+soundfile==0.12.1
+h11==0.16.0
+jupyter_core==5.7.2
+pycparser==2.22
+nvidia-cudnn-cu12==9.1.0.70
+peft==0.14.0
+scipy==1.14.1
+wandb==0.19.7
+charset-normalizer==3.4.0
+cryptography==44.0.3
+distlib==0.3.9
+findpython==0.6.3
+setuptools==75.6.0
+torch==2.5.1
+llvmlite==0.43.0
+nest-asyncio==1.6.0
+more-itertools==10.3.0
+inflect==7.3.1
+typing_extensions==4.12.2
+jaraco.context==5.3.0
+tomli==2.0.1
+platformdirs==4.2.2
+zipp==3.19.2
+jaraco.functools==4.0.1
+packaging==24.2
+typeguard==4.3.0
+wheel==0.43.0
+autocommand==2.2.2
+backports.tarfile==1.2.0
+jaraco.collections==5.1.0
+jaraco.text==3.12.1
+importlib_metadata==8.0.0

wandb/run-20250515_192303-7xkscxrj/files/wandb-metadata.json ADDED Viewed

	@@ -0,0 +1,96 @@

+{
+  "os":  "Linux-5.15.0-1079-azure-x86_64-with-glibc2.31",
+  "python":  "CPython 3.11.11",
+  "startedAt":  "2025-05-15T19:23:03.492613Z",
+  "args":  [
+    "--dataset_name=facebook/voxpopuli",
+    "--model_name_or_path=./seq2seq_wav2vec2_bart-base_24k-en-voxpopuli",
+    "--dataset_config_name=en",
+    "--train_split_name=train",
+    "--eval_split_name=validation",
+    "--test_split_name=test",
+    "--output_dir=./seq2seq_wav2vec2_bart-base_24k-en-voxpopuli/t1_new1_spec",
+    "--preprocessing_num_workers=1",
+    "--dataloader_num_workers=16",
+    "--dataloader_prefetch_factor=2",
+    "--length_column_name=input_length",
+    "--overwrite_output_dir",
+    "--num_train_epochs=20",
+    "--per_device_train_batch_size=96",
+    "--per_device_eval_batch_size=96",
+    "--gradient_accumulation_steps=1",
+    "--learning_rate=1e-4",
+    "--label_smoothing_factor=0.05",
+    "--apply_spec_augment",
+    "--mask_time_prob=0.25",
+    "--mask_time_length=30",
+    "--mask_time_min_masks=2",
+    "--mask_feature_prob=0.3",
+    "--mask_feature_length=30",
+    "--mask_feature_min_masks=1",
+    "--weight_decay=0.01",
+    "--lr_scheduler_type=cosine_with_min_lr",
+    "--lr_scheduler_kwargs={\"min_lr\": 5e-9}",
+    "--warmup_steps=2000",
+    "--eval_strategy=steps",
+    "--text_column_name=normalized_text",
+    "--save_strategy=steps",
+    "--eval_steps=1000",
+    "--save_steps=1000",
+    "--load_best_model_at_end",
+    "--metric_for_best_model=wer",
+    "--greater_is_better=False",
+    "--logging_steps=10",
+    "--save_total_limit=1",
+    "--freeze_feature_encoder",
+    "--bf16",
+    "--task=transcribe",
+    "--predict_with_generate",
+    "--do_train",
+    "--do_eval",
+    "--do_predict",
+    "--do_lower_case",
+    "--trust_remote_code",
+    "--report_to=wandb",
+    "--sclite_path=/home/azureuser/media-disk/mh_dp/SCTK/bin/sclite",
+    "--wandb_project=seq2seq_encoder-decoder_vox",
+    "--cache_dir=/home/azureuser/media-disk/mh_dp/preprocessed_dataset_voxpopuli"
+  ],
+  "program":  "/media/disk/mh_dp/run_speech_recognition_seq2seq.py",
+  "codePath":  "run_speech_recognition_seq2seq.py",
+  "git":  {
+    "remote":  "https://github.com/hornikmatej/thesis_mit.git",
+    "commit":  "f785b399a218c31f74efa57fa6057a8f5848df90"
+  },
+  "email":  "[email protected]",
+  "root":  "/media/disk/mh_dp",
+  "host":  "achjo",
+  "executable":  "/media/disk/conda-envs/mh_dp/bin/python",
+  "codePathLocal":  "run_speech_recognition_seq2seq.py",
+  "cpu_count":  24,
+  "cpu_count_logical":  24,
+  "gpu":  "NVIDIA A100 80GB PCIe",
+  "gpu_count":  1,
+  "disk":  {
+    "/":  {
+      "total":  "126759518208",
+      "used":  "121216040960"
+    }
+  },
+  "memory":  {
+    "total":  "232206929920"
+  },
+  "cpu":  {
+    "count":  24,
+    "countLogical":  24
+  },
+  "gpu_nvidia":  [
+    {
+      "name":  "NVIDIA A100 80GB PCIe",
+      "memoryTotal":  "85899345920",
+      "cudaCores":  6912,
+      "architecture":  "Ampere"
+    }
+  ],
+  "cudaVersion":  "12.4"
+}

wandb/run-20250515_192303-7xkscxrj/files/wandb-summary.json ADDED Viewed

	@@ -0,0 +1 @@

+ {"eval/loss":1.0561457872390747,"train_samples_per_second":93.771,"train_runtime":35628.5116,"deletions":2.1,"train/grad_norm":3.1310455799102783,"sentence_errors":69.79,"eval/dev_wer":0.08554638942253362,"train/learning_rate":5e-09,"eval/dev_runtime":121.5437,"eval/test_runtime":132.2526,"train/epoch":20,"eval/test_samples_per_second":12.892,"eval/wer":0.08608317323991412,"eval/runtime":122.4086,"_runtime":35917.482322344,"model_speed2size1_table":{"artifact_path":"wandb-client-artifact://22ryvcq5sfoxrkdlg5snrezdqeux1vlyv16ngbo07h3jbzffqjwebkvyz3zmsi2e0yxfoypfkb3m9qh2gmzp1wad6w8ijltxguisyd9kp4eji86yqh2tjqpzuu78dv8a/model_speed2size1_table.table.json","_latest_artifact_path":"wandb-client-artifact://bmhjxqeluon6nds3foy3om07pd3ur8vjndrhq5f4l4sfxeikus6kk0lsyxy74p6omxe5ipufpt1xab1spvmr9j535tlw1jd59813g4tl3ud02gtm9j0wv7ipfp9b61wq:latest/model_speed2size1_table.table.json","path":"media/table/model_speed2size1_table_3555_34483c9cf24b143db620.table.json","ncols":2,"nrows":1,"_type":"table-file","sha256":"34483c9cf24b143db6206cc8a337dd90c28037eadcc75c16498b0e74c09b0a94","size":99},"eval/dev_steps_per_second":0.14,"train/global_step":34820,"eval/dev_loss":1.0564184188842773,"_step":3557,"_timestamp":1.7473729009748125e+09,"train_loss":1.6298207611684346,"eval/dev_samples_per_second":13.09,"_wandb":{"runtime":35917},"train/loss":1.1786,"substitutions":4.88,"eval/test_wer":0.08848048503220916,"word_accuracy":93.02,"train_steps_per_second":0.977,"eval/steps_per_second":0.139,"eval/samples_per_second":12.997,"model_speed2size2_table":{"_latest_artifact_path":"wandb-client-artifact://li63ohwsnagsgukuzxxoedjq23zmywgayy9s567hfvm8iuj19u9laeb1yfyt7s5gx88u27qlfs25bna6xx34hq9yv5rrhz6pa1s7mu8ilv3xwtp7gu06fdhb5anlvjg9:latest/model_speed2size2_table.table.json","path":"media/table/model_speed2size2_table_3556_ffc3f22eaf8a279337f3.table.json","ncols":2,"nrows":1,"_type":"table-file","sha256":"ffc3f22eaf8a279337f31351ecc40b7b10ad3fc2530ff63bb94dfd99cf1707b4","size":95,"artifact_path":"wandb-client-artifact://zs3kneyjnk8zwtj296er3yaztikcv5ro8nl4jnqbua1dzk6zoaan1kxcdfuhwv9us50mlrguj0mgrvntft5p6iuju1qayquwa3b3l93g58r7o5ifatkwzdkuj0bq8y16/model_speed2size2_table.table.json"},"total_flos":0,"eval/test_loss":1.0758554935455322,"test_sample_index":111,"word_errors":8.84,"insertions":1.86,"eval/test_steps_per_second":0.136}

wandb/run-20250515_192303-7xkscxrj/logs/debug-core.log ADDED Viewed

	@@ -0,0 +1,15 @@

+{"time":"2025-05-15T19:23:02.990833788Z","level":"INFO","msg":"main: starting server","port-filename":"/tmp/tmpi1e880he/port-4153506.txt","pid":4153506,"log-level":0,"disable-analytics":false,"shutdown-on-parent-exit":false}
+{"time":"2025-05-15T19:23:02.992445129Z","level":"INFO","msg":"Will exit if parent process dies.","ppid":4153506}
+{"time":"2025-05-15T19:23:02.992359253Z","level":"INFO","msg":"server is running","addr":{"IP":"127.0.0.1","Port":42841,"Zone":""}}
+{"time":"2025-05-15T19:23:03.146123389Z","level":"INFO","msg":"connection: ManageConnectionData: new connection created","id":"127.0.0.1:33852"}
+{"time":"2025-05-15T19:23:03.494056011Z","level":"INFO","msg":"handleInformInit: received","streamId":"7xkscxrj","id":"127.0.0.1:33852"}
+{"time":"2025-05-15T19:23:03.597616412Z","level":"INFO","msg":"handleInformInit: stream started","streamId":"7xkscxrj","id":"127.0.0.1:33852"}
+{"time":"2025-05-16T05:21:43.113058108Z","level":"INFO","msg":"handleInformFinish: finish message received","streamId":"7xkscxrj","id":"127.0.0.1:33852"}
+{"time":"2025-05-16T05:21:43.113219442Z","level":"INFO","msg":"handleInformFinish: stream closed","streamId":"7xkscxrj","id":"127.0.0.1:33852"}
+{"time":"2025-05-16T05:21:43.15553858Z","level":"INFO","msg":"handleInformTeardown: server teardown initiated","id":"127.0.0.1:33852"}
+{"time":"2025-05-16T05:21:43.155584566Z","level":"INFO","msg":"handleInformTeardown: server shutdown complete","id":"127.0.0.1:33852"}
+{"time":"2025-05-16T05:21:43.155594696Z","level":"INFO","msg":"server is shutting down"}
+{"time":"2025-05-16T05:21:43.155632972Z","level":"INFO","msg":"connection: closing","id":"127.0.0.1:33852"}
+{"time":"2025-05-16T05:21:43.155728782Z","level":"INFO","msg":"connection: closed successfully","id":"127.0.0.1:33852"}
+{"time":"2025-05-16T05:21:43.155739402Z","level":"INFO","msg":"connection: ManageConnectionData: connection closed","id":"127.0.0.1:33852"}
+{"time":"2025-05-16T05:21:43.155754471Z","level":"INFO","msg":"server is closed"}

wandb/run-20250515_192303-7xkscxrj/logs/debug-internal.log ADDED Viewed

	@@ -0,0 +1,17 @@

+{"time":"2025-05-15T19:23:03.494323592Z","level":"INFO","msg":"stream: starting","core version":"0.19.7","symlink path":"/media/disk/mh_dp/wandb/run-20250515_192303-7xkscxrj/logs/debug-core.log"}
+{"time":"2025-05-15T19:23:03.597586687Z","level":"INFO","msg":"created new stream","id":"7xkscxrj"}
+{"time":"2025-05-15T19:23:03.597610471Z","level":"INFO","msg":"stream: started","id":"7xkscxrj"}
+{"time":"2025-05-15T19:23:03.597663725Z","level":"INFO","msg":"writer: Do: started","stream_id":"7xkscxrj"}
+{"time":"2025-05-15T19:23:03.597740644Z","level":"INFO","msg":"sender: started","stream_id":"7xkscxrj"}
+{"time":"2025-05-15T19:23:03.597802771Z","level":"INFO","msg":"handler: started","stream_id":"7xkscxrj"}
+{"time":"2025-05-15T19:23:03.929230983Z","level":"INFO","msg":"Starting system monitor"}
+{"time":"2025-05-15T21:30:19.314744757Z","level":"INFO","msg":"api: retrying HTTP error","status":502,"url":"https://api.wandb.ai/files/xhorni20-fitvut/seq2seq_encoder-decoder_vox/7xkscxrj/file_stream","body":"\n<html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>502 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>\n"}
+{"time":"2025-05-16T05:21:40.975811644Z","level":"INFO","msg":"Stopping system monitor"}
+{"time":"2025-05-16T05:21:40.97633554Z","level":"INFO","msg":"Stopped system monitor"}
+{"time":"2025-05-16T05:21:41.958538892Z","level":"INFO","msg":"fileTransfer: Close: file transfer manager closed"}
+{"time":"2025-05-16T05:21:41.977317704Z","level":"INFO","msg":"handler: operation stats","stats":{"operations":[{"desc":"uploading history steps 3555-3557, summary, console lines 6010-6016","runtime_seconds":0.018694493}],"total_operations":1}}
+{"time":"2025-05-16T05:21:43.113094437Z","level":"INFO","msg":"stream: closing","id":"7xkscxrj"}
+{"time":"2025-05-16T05:21:43.113123361Z","level":"INFO","msg":"handler: closed","stream_id":"7xkscxrj"}
+{"time":"2025-05-16T05:21:43.113134703Z","level":"INFO","msg":"writer: Close: closed","stream_id":"7xkscxrj"}
+{"time":"2025-05-16T05:21:43.113199044Z","level":"INFO","msg":"sender: closed","stream_id":"7xkscxrj"}
+{"time":"2025-05-16T05:21:43.113213511Z","level":"INFO","msg":"stream: closed","id":"7xkscxrj"}

wandb/run-20250515_192303-7xkscxrj/logs/debug.log ADDED Viewed

	@@ -0,0 +1,35 @@

+2025-05-15 19:23:03,488 INFO    MainThread:4153506 [wandb_setup.py:_flush():67] Current SDK version is 0.19.7
+2025-05-15 19:23:03,489 INFO    MainThread:4153506 [wandb_setup.py:_flush():67] Configure stats pid to 4153506
+2025-05-15 19:23:03,489 INFO    MainThread:4153506 [wandb_setup.py:_flush():67] Loading settings from /home/azureuser/.config/wandb/settings
+2025-05-15 19:23:03,489 INFO    MainThread:4153506 [wandb_setup.py:_flush():67] Loading settings from /media/disk/mh_dp/wandb/settings
+2025-05-15 19:23:03,489 INFO    MainThread:4153506 [wandb_setup.py:_flush():67] Loading settings from environment variables
+2025-05-15 19:23:03,489 INFO    MainThread:4153506 [wandb_init.py:setup_run_log_directory():647] Logging user logs to /media/disk/mh_dp/wandb/run-20250515_192303-7xkscxrj/logs/debug.log
+2025-05-15 19:23:03,489 INFO    MainThread:4153506 [wandb_init.py:setup_run_log_directory():648] Logging internal logs to /media/disk/mh_dp/wandb/run-20250515_192303-7xkscxrj/logs/debug-internal.log
+2025-05-15 19:23:03,489 INFO    MainThread:4153506 [wandb_init.py:init():761] calling init triggers
+2025-05-15 19:23:03,489 INFO    MainThread:4153506 [wandb_init.py:init():766] wandb.init called with sweep_config: {}
+config: {'_wandb': {}}
+2025-05-15 19:23:03,489 INFO    MainThread:4153506 [wandb_init.py:init():784] starting backend
+2025-05-15 19:23:03,489 INFO    MainThread:4153506 [wandb_init.py:init():788] sending inform_init request
+2025-05-15 19:23:03,492 INFO    MainThread:4153506 [backend.py:_multiprocessing_setup():97] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
+2025-05-15 19:23:03,492 INFO    MainThread:4153506 [wandb_init.py:init():803] backend started and connected
+2025-05-15 19:23:03,493 INFO    MainThread:4153506 [wandb_init.py:init():896] updated telemetry
+2025-05-15 19:23:03,498 INFO    MainThread:4153506 [wandb_init.py:init():920] communicating run to backend with 90.0 second timeout
+2025-05-15 19:23:03,927 INFO    MainThread:4153506 [wandb_init.py:init():995] starting run threads in backend
+2025-05-15 19:23:04,024 INFO    MainThread:4153506 [wandb_run.py:_console_start():2377] atexit reg
+2025-05-15 19:23:04,024 INFO    MainThread:4153506 [wandb_run.py:_redirect():2227] redirect: wrap_raw
+2025-05-15 19:23:04,024 INFO    MainThread:4153506 [wandb_run.py:_redirect():2292] Wrapping output streams.
+2025-05-15 19:23:04,024 INFO    MainThread:4153506 [wandb_run.py:_redirect():2317] Redirects installed.
+2025-05-15 19:23:04,026 INFO    MainThread:4153506 [wandb_init.py:init():1037] run started, returning control to user process
+2025-05-15 19:23:07,838 INFO    MainThread:4153506 [wandb_run.py:_config_callback():1261] config_cb None None {'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float32', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': False, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': True, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': None, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['SpeechEncoderDecoderModel'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': None, 'pad_token_id': 1, 'eos_token_id': 2, 'sep_token_id': None, 'decoder_start_token_id': 0, 'task_specific_params': None, 'problem_type': None, '_name_or_path': './seq2seq_wav2vec2_bart-base_24k-en-voxpopuli', '_attn_implementation_autoset': True, 'transformers_version': '4.46.3', 'decoder': {'vocab_size': 50265, 'max_position_embeddings': 1024, 'd_model': 768, 'encoder_ffn_dim': 3072, 'encoder_layers': 6, 'encoder_attention_heads': 12, 'decoder_ffn_dim': 3072, 'decoder_layers': 6, 'decoder_attention_heads': 12, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.1, 'activation_function': 'gelu', 'init_std': 0.02, 'encoder_layerdrop': 0.0, 'decoder_layerdrop': 0.0, 'classifier_dropout': 0.0, 'use_cache': True, 'num_hidden_layers': 6, 'scale_embedding': False, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float32', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': True, 'cross_attention_hidden_size': None, 'add_cross_attention': True, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': True, 'num_beams': 4, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 3, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': 0, 'forced_eos_token_id': 2, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['BartModel'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1', 2: 'LABEL_2'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1, 'LABEL_2': 2}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 0, 'pad_token_id': 1, 'eos_token_id': 2, 'sep_token_id': None, 'decoder_start_token_id': 2, 'task_specific_params': {'summarization': {'length_penalty': 1.0, 'max_length': 128, 'min_length': 12, 'num_beams': 4}, 'summarization_cnn': {'length_penalty': 2.0, 'max_length': 142, 'min_length': 56, 'num_beams': 4}, 'summarization_xsum': {'length_penalty': 1.0, 'max_length': 62, 'min_length': 11, 'num_beams': 6}}, 'problem_type': None, '_name_or_path': 'facebook/bart-base', '_attn_implementation_autoset': True, 'add_bias_logits': False, 'add_final_layer_norm': False, 'classif_dropout': 0.1, 'gradient_checkpointing': False, 'normalize_before': False, 'normalize_embedding': True, 'model_type': 'bart'}, 'encoder': {'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float32', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['Wav2Vec2ForPreTraining'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 1, 'pad_token_id': 0, 'eos_token_id': 2, 'sep_token_id': None, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'facebook/wav2vec2-base-en-voxpopuli-v2', '_attn_implementation_autoset': True, 'freeze_feat_extract_train': True, 'mask_channel_length': 10, 'mask_channel_min_space': 1, 'mask_channel_other': 0.0, 'mask_channel_prob': 0.0, 'mask_channel_selection': 'static', 'mask_time_min_space': 1, 'mask_time_other': 0.0, 'mask_time_selection': 'static', 'no_mask_channel_overlap': False, 'no_mask_time_overlap': False, 'num_feat_extract_layers': 7, 'hidden_size': 768, 'feat_extract_norm': 'group', 'feat_extract_activation': 'gelu', 'conv_dim': [512, 512, 512, 512, 512, 512, 512], 'conv_stride': [5, 2, 2, 2, 2, 2, 2], 'conv_kernel': [10, 3, 3, 3, 3, 2, 2], 'conv_bias': False, 'num_conv_pos_embeddings': 128, 'num_conv_pos_embedding_groups': 16, 'num_hidden_layers': 12, 'intermediate_size': 3072, 'hidden_act': 'gelu', 'num_attention_heads': 12, 'hidden_dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'feat_proj_dropout': 0.0, 'final_dropout': 0.0, 'layerdrop': 0.0, 'layer_norm_eps': 1e-05, 'initializer_range': 0.02, 'vocab_size': 32, 'do_stable_layer_norm': False, 'use_weighted_layer_sum': False, 'apply_spec_augment': True, 'mask_time_prob': 0.25, 'mask_time_length': 30, 'mask_time_min_masks': 2, 'mask_feature_prob': 0.3, 'mask_feature_length': 30, 'mask_feature_min_masks': 1, 'num_codevectors_per_group': 320, 'num_codevector_groups': 2, 'contrastive_logits_temperature': 0.1, 'feat_quantizer_dropout': 0.0, 'num_negatives': 100, 'codevector_dim': 256, 'proj_codevector_dim': 256, 'diversity_loss_weight': 0.1, 'ctc_loss_reduction': 'sum', 'ctc_zero_infinity': False, 'add_adapter': True, 'adapter_kernel_size': 3, 'adapter_stride': 2, 'num_adapter_layers': 3, 'output_hidden_size': 768, 'adapter_attn_dim': None, 'classifier_proj_size': 256, 'tdnn_dim': [512, 512, 512, 512, 1500], 'tdnn_kernel': [5, 3, 3, 1, 1], 'tdnn_dilation': [1, 2, 3, 1, 1], 'xvector_output_dim': 512, 'model_type': 'wav2vec2'}, 'model_type': 'speech-encoder-decoder', 'processor_class': 'Wav2Vec2Processor', 'use_cache': False, 'forced_decoder_ids': None, 'output_dir': './seq2seq_wav2vec2_bart-base_24k-en-voxpopuli/t1_new1_spec', 'overwrite_output_dir': True, 'do_train': True, 'do_eval': True, 'do_predict': True, 'eval_strategy': 'steps', 'prediction_loss_only': False, 'per_device_train_batch_size': 96, 'per_device_eval_batch_size': 96, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 0.0001, 'weight_decay': 0.01, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 20.0, 'max_steps': -1, 'lr_scheduler_type': 'cosine_with_min_lr', 'lr_scheduler_kwargs': {'min_lr': 5e-09}, 'warmup_ratio': 0.0, 'warmup_steps': 2000, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': './seq2seq_wav2vec2_bart-base_24k-en-voxpopuli/t1_new1_spec/runs/May15_19-23-02_achjo', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 10, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 1000, 'save_total_limit': 1, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': True, 'fp16': False, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': 1000, 'dataloader_num_workers': 16, 'dataloader_prefetch_factor': 2, 'past_index': -1, 'run_name': 'facebook/voxpopuli_en_split-train_wav2vec2-bart_bs96_lr0.0001_ep20.0', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': True, 'metric_for_best_model': 'wer', 'greater_is_better': False, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.05, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'input_length', 'report_to': ['wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': False, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': False, 'hub_always_push': False, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': None, 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'eval_use_gather_object': False, 'average_tokens_across_devices': False, 'sortish_sampler': False, 'predict_with_generate': True, 'generation_max_length': None, 'generation_num_beams': None, 'generation_config': None}
+2025-05-15 19:23:07,840 INFO    MainThread:4153506 [wandb_config.py:__setitem__():154] config set model/num_parameters = 201096832 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x7fdd43c46990>>
+2025-05-15 19:23:07,840 INFO    MainThread:4153506 [wandb_run.py:_config_callback():1261] config_cb model/num_parameters 201096832 None
+2025-05-16 05:21:40,579 INFO    MainThread:4153506 [wandb_run.py:_config_callback():1261] config_cb ('_wandb', 'visualize', 'model_speed2size1') {'panel_type': 'Vega2', 'panel_config': {'panelDefId': 'wandb/scatter/v0', 'fieldSettings': {'x': 'Time per step', 'y': 'Trainable parameters'}, 'stringSettings': {'title': ''}, 'transform': {'name': 'tableWithLeafColNames'}, 'userQuery': {'queryFields': [{'name': 'runSets', 'args': [{'name': 'runSets', 'value': '${runSets}'}], 'fields': [{'name': 'id', 'fields': []}, {'name': 'name', 'fields': []}, {'name': '_defaultColorIndex', 'fields': []}, {'name': 'summaryTable', 'args': [{'name': 'tableKey', 'value': 'model_speed2size1_table'}], 'fields': []}]}]}}} None
+2025-05-16 05:21:40,886 INFO    MainThread:4153506 [wandb_run.py:_config_callback():1261] config_cb ('_wandb', 'visualize', 'model_speed2size2') {'panel_type': 'Vega2', 'panel_config': {'panelDefId': 'wandb/scatter/v0', 'fieldSettings': {'x': 'Time per step', 'y': 'Total parameters'}, 'stringSettings': {'title': ''}, 'transform': {'name': 'tableWithLeafColNames'}, 'userQuery': {'queryFields': [{'name': 'runSets', 'args': [{'name': 'runSets', 'value': '${runSets}'}], 'fields': [{'name': 'id', 'fields': []}, {'name': 'name', 'fields': []}, {'name': '_defaultColorIndex', 'fields': []}, {'name': 'summaryTable', 'args': [{'name': 'tableKey', 'value': 'model_speed2size2_table'}], 'fields': []}]}]}}} None
+2025-05-16 05:21:40,974 INFO    MainThread:4153506 [wandb_run.py:_finish():2112] finishing run xhorni20-fitvut/seq2seq_encoder-decoder_vox/7xkscxrj
+2025-05-16 05:21:40,975 INFO    MainThread:4153506 [wandb_run.py:_atexit_cleanup():2342] got exitcode: 0
+2025-05-16 05:21:40,975 INFO    MainThread:4153506 [wandb_run.py:_restore():2324] restore
+2025-05-16 05:21:40,975 INFO    MainThread:4153506 [wandb_run.py:_restore():2330] restore done
+2025-05-16 05:21:43,110 INFO    MsgRouterThr:4153506 [mailbox.py:close():115] Closing mailbox, abandoning 1 handles.
+2025-05-16 05:21:43,111 INFO    MainThread:4153506 [wandb_run.py:_footer_history_summary_info():3958] rendering history
+2025-05-16 05:21:43,112 INFO    MainThread:4153506 [wandb_run.py:_footer_history_summary_info():3990] rendering summary
+2025-05-16 05:21:43,112 INFO    MainThread:4153506 [wandb_run.py:_footer_sync_info():3919] logging synced files

wandb/run-20250515_192303-7xkscxrj/run-7xkscxrj.wandb ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5c576cb9e4a59adc74876e8b216885ecc868eb504703fd30c956b98a33e5571e
+size 19765491