manu commited on Jul 14

Commit

cebca1f

verified ·

1 Parent(s): 06dcf00

Upload folder using huggingface_hub

Browse files

Files changed (26) hide show

.gitattributes +1 -0
README.md +202 -0
adapter_config.json +31 -0
adapter_model.safetensors +3 -0
added_tokens.json +24 -0
chat_template.jinja +7 -0
checkpoint-2310/README.md +202 -0
checkpoint-2310/adapter_config.json +31 -0
checkpoint-2310/adapter_model.safetensors +3 -0
checkpoint-2310/optimizer.pt +3 -0
checkpoint-2310/rng_state_0.pth +3 -0
checkpoint-2310/rng_state_1.pth +3 -0
checkpoint-2310/rng_state_2.pth +3 -0
checkpoint-2310/rng_state_3.pth +3 -0
checkpoint-2310/scheduler.pt +3 -0
checkpoint-2310/trainer_state.json +1835 -0
checkpoint-2310/training_args.bin +3 -0
git_hash.txt +1 -0
merges.txt +0 -0
preprocessor_config.json +31 -0
special_tokens_map.json +38 -0
tokenizer.json +3 -0
tokenizer_config.json +222 -0
train_colqwenomni_model.py +103 -0
video_preprocessor_config.json +56 -0
vocab.json +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: vidore/colqwen2.5omni-base
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.15.2

adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "vidore/colqwen2.5omni-base",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": "gaussian",
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": "(.*(model)(?!.*visual).*(down_proj|gate_proj|up_proj|k_proj|q_proj|v_proj|o_proj).*$|.*(custom_text_proj).*$)",
+  "task_type": "FEATURE_EXTRACTION",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a3160c931d298df8bc7ea3ca33f18b8fdf699fe5c8f23ccf7a5389b12310dd08
+size 239815040

added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|AUDIO|>": 151646,
+  "<|IMAGE|>": 151655,
+  "<|VIDEO|>": 151656,
+  "<|audio_bos|>": 151647,
+  "<|audio_eos|>": 151648,
+  "<|box_end|>": 151649,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|vision_bos|>": 151652,
+  "<|vision_eos|>": 151653,
+  "<|vision_pad|>": 151654
+}

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,7 @@

+{% set audio_count = namespace(value=0) %}{% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system
+You are a helpful assistant.<|im_end|>
+{% endif %}<|im_start|>{{ message['role'] }}
+{% if message['content'] is string %}{{ message['content'] }}<|im_end|>
+{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_bos|><|IMAGE|><|vision_eos|>{% elif content['type'] == 'audio' or 'audio' in content or 'audio_url' in content %}{% set audio_count.value = audio_count.value + 1 %}{% if add_audio_id %}Audio {{ audio_count.value }}: {% endif %}<|audio_bos|><|AUDIO|><|audio_eos|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_bos|><|VIDEO|><|vision_eos|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>
+{% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant
+{% endif %}

checkpoint-2310/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: ./models/base_models/colqwen2.5omni-base
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.15.2

checkpoint-2310/adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "./models/base_models/colqwen2.5omni-base",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": "gaussian",
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": "(.*(model)(?!.*visual).*(down_proj|gate_proj|up_proj|k_proj|q_proj|v_proj|o_proj).*$|.*(custom_text_proj).*$)",
+  "task_type": "FEATURE_EXTRACTION",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-2310/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a3160c931d298df8bc7ea3ca33f18b8fdf699fe5c8f23ccf7a5389b12310dd08
+size 239815040

checkpoint-2310/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bb0841a27b3e5e0653154852aebe78d5eb063396ff95536280b4406694df06d7
+size 479921873

checkpoint-2310/rng_state_0.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:83046fac2b7438ad2a4687f65dc0b047f36a2b0978aca3f37ad9424882f89aa6
+size 15365

checkpoint-2310/rng_state_1.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7a1968fdb12179df42f71a51269af81b8919dfaa8db1d8113f94356d44da3413
+size 15365

checkpoint-2310/rng_state_2.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4478fdb9cf68d30391c2a54b00325f262bee7827b59f540b311515648be1a848
+size 15365

checkpoint-2310/rng_state_3.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a85da7d45d0a4cdcd5689324d284c9d4bf7d9eacb4a1180d4c3df6d204fe2167
+size 15365

checkpoint-2310/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:25b0085c1e047b756f13db0ba3741cc59e9579c6b3663e6f0c6a07707a16fae7
+size 1465

checkpoint-2310/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1835 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 5.0,
+  "eval_steps": 100,
+  "global_step": 2310,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.021645021645021644,
+      "grad_norm": 1.49063241481781,
+      "learning_rate": 4.5e-06,
+      "loss": 5.15,
+      "step": 10
+    },
+    {
+      "epoch": 0.04329004329004329,
+      "grad_norm": 2.3369944095611572,
+      "learning_rate": 9.5e-06,
+      "loss": 5.1055,
+      "step": 20
+    },
+    {
+      "epoch": 0.06493506493506493,
+      "grad_norm": 2.44476318359375,
+      "learning_rate": 1.45e-05,
+      "loss": 4.9139,
+      "step": 30
+    },
+    {
+      "epoch": 0.08658008658008658,
+      "grad_norm": 4.037220001220703,
+      "learning_rate": 1.9500000000000003e-05,
+      "loss": 4.4762,
+      "step": 40
+    },
+    {
+      "epoch": 0.10822510822510822,
+      "grad_norm": 5.722354412078857,
+      "learning_rate": 2.45e-05,
+      "loss": 3.7558,
+      "step": 50
+    },
+    {
+      "epoch": 0.12987012987012986,
+      "grad_norm": 6.332724571228027,
+      "learning_rate": 2.95e-05,
+      "loss": 2.7035,
+      "step": 60
+    },
+    {
+      "epoch": 0.15151515151515152,
+      "grad_norm": 2.5632152557373047,
+      "learning_rate": 3.45e-05,
+      "loss": 2.0313,
+      "step": 70
+    },
+    {
+      "epoch": 0.17316017316017315,
+      "grad_norm": 4.372506141662598,
+      "learning_rate": 3.9500000000000005e-05,
+      "loss": 1.5306,
+      "step": 80
+    },
+    {
+      "epoch": 0.19480519480519481,
+      "grad_norm": 2.5026726722717285,
+      "learning_rate": 4.4500000000000004e-05,
+      "loss": 1.2437,
+      "step": 90
+    },
+    {
+      "epoch": 0.21645021645021645,
+      "grad_norm": 1.9875411987304688,
+      "learning_rate": 4.9500000000000004e-05,
+      "loss": 0.9561,
+      "step": 100
+    },
+    {
+      "epoch": 0.21645021645021645,
+      "eval_loss": 0.19921594858169556,
+      "eval_runtime": 11.6195,
+      "eval_samples_per_second": 43.031,
+      "eval_steps_per_second": 0.688,
+      "step": 100
+    },
+    {
+      "epoch": 0.23809523809523808,
+      "grad_norm": 1.7703560590744019,
+      "learning_rate": 4.979638009049774e-05,
+      "loss": 0.7735,
+      "step": 110
+    },
+    {
+      "epoch": 0.2597402597402597,
+      "grad_norm": 1.7760099172592163,
+      "learning_rate": 4.957013574660634e-05,
+      "loss": 0.6721,
+      "step": 120
+    },
+    {
+      "epoch": 0.2813852813852814,
+      "grad_norm": 2.115910530090332,
+      "learning_rate": 4.934389140271494e-05,
+      "loss": 0.6748,
+      "step": 130
+    },
+    {
+      "epoch": 0.30303030303030304,
+      "grad_norm": 2.5156407356262207,
+      "learning_rate": 4.911764705882353e-05,
+      "loss": 0.6629,
+      "step": 140
+    },
+    {
+      "epoch": 0.3246753246753247,
+      "grad_norm": 1.8950108289718628,
+      "learning_rate": 4.8891402714932124e-05,
+      "loss": 0.6032,
+      "step": 150
+    },
+    {
+      "epoch": 0.3463203463203463,
+      "grad_norm": 2.0162742137908936,
+      "learning_rate": 4.8665158371040724e-05,
+      "loss": 0.5883,
+      "step": 160
+    },
+    {
+      "epoch": 0.36796536796536794,
+      "grad_norm": 1.5109184980392456,
+      "learning_rate": 4.843891402714932e-05,
+      "loss": 0.6149,
+      "step": 170
+    },
+    {
+      "epoch": 0.38961038961038963,
+      "grad_norm": 1.7340754270553589,
+      "learning_rate": 4.821266968325792e-05,
+      "loss": 0.589,
+      "step": 180
+    },
+    {
+      "epoch": 0.41125541125541126,
+      "grad_norm": 1.6524324417114258,
+      "learning_rate": 4.798642533936652e-05,
+      "loss": 0.5883,
+      "step": 190
+    },
+    {
+      "epoch": 0.4329004329004329,
+      "grad_norm": 1.5651330947875977,
+      "learning_rate": 4.7760180995475115e-05,
+      "loss": 0.5707,
+      "step": 200
+    },
+    {
+      "epoch": 0.4329004329004329,
+      "eval_loss": 0.13576850295066833,
+      "eval_runtime": 10.9595,
+      "eval_samples_per_second": 45.623,
+      "eval_steps_per_second": 0.73,
+      "step": 200
+    },
+    {
+      "epoch": 0.45454545454545453,
+      "grad_norm": 1.4788271188735962,
+      "learning_rate": 4.753393665158371e-05,
+      "loss": 0.5889,
+      "step": 210
+    },
+    {
+      "epoch": 0.47619047619047616,
+      "grad_norm": 1.4938485622406006,
+      "learning_rate": 4.730769230769231e-05,
+      "loss": 0.5377,
+      "step": 220
+    },
+    {
+      "epoch": 0.49783549783549785,
+      "grad_norm": 1.3584009408950806,
+      "learning_rate": 4.7081447963800906e-05,
+      "loss": 0.5202,
+      "step": 230
+    },
+    {
+      "epoch": 0.5194805194805194,
+      "grad_norm": 3.0394833087921143,
+      "learning_rate": 4.6855203619909505e-05,
+      "loss": 0.5044,
+      "step": 240
+    },
+    {
+      "epoch": 0.5411255411255411,
+      "grad_norm": 1.2974430322647095,
+      "learning_rate": 4.6628959276018105e-05,
+      "loss": 0.5448,
+      "step": 250
+    },
+    {
+      "epoch": 0.5627705627705628,
+      "grad_norm": 1.4595730304718018,
+      "learning_rate": 4.64027149321267e-05,
+      "loss": 0.4816,
+      "step": 260
+    },
+    {
+      "epoch": 0.5844155844155844,
+      "grad_norm": 1.248807430267334,
+      "learning_rate": 4.61764705882353e-05,
+      "loss": 0.4622,
+      "step": 270
+    },
+    {
+      "epoch": 0.6060606060606061,
+      "grad_norm": 1.4656962156295776,
+      "learning_rate": 4.595022624434389e-05,
+      "loss": 0.4807,
+      "step": 280
+    },
+    {
+      "epoch": 0.6277056277056277,
+      "grad_norm": 1.5670865774154663,
+      "learning_rate": 4.572398190045249e-05,
+      "loss": 0.5012,
+      "step": 290
+    },
+    {
+      "epoch": 0.6493506493506493,
+      "grad_norm": 1.6379127502441406,
+      "learning_rate": 4.549773755656109e-05,
+      "loss": 0.467,
+      "step": 300
+    },
+    {
+      "epoch": 0.6493506493506493,
+      "eval_loss": 0.1379440426826477,
+      "eval_runtime": 11.0085,
+      "eval_samples_per_second": 45.419,
+      "eval_steps_per_second": 0.727,
+      "step": 300
+    },
+    {
+      "epoch": 0.670995670995671,
+      "grad_norm": 1.4250965118408203,
+      "learning_rate": 4.527149321266969e-05,
+      "loss": 0.4903,
+      "step": 310
+    },
+    {
+      "epoch": 0.6926406926406926,
+      "grad_norm": 1.3096636533737183,
+      "learning_rate": 4.504524886877829e-05,
+      "loss": 0.468,
+      "step": 320
+    },
+    {
+      "epoch": 0.7142857142857143,
+      "grad_norm": 1.3946788311004639,
+      "learning_rate": 4.481900452488688e-05,
+      "loss": 0.4628,
+      "step": 330
+    },
+    {
+      "epoch": 0.7359307359307359,
+      "grad_norm": 1.3430134057998657,
+      "learning_rate": 4.459276018099548e-05,
+      "loss": 0.4711,
+      "step": 340
+    },
+    {
+      "epoch": 0.7575757575757576,
+      "grad_norm": 1.246222734451294,
+      "learning_rate": 4.436651583710407e-05,
+      "loss": 0.4829,
+      "step": 350
+    },
+    {
+      "epoch": 0.7792207792207793,
+      "grad_norm": 1.406936764717102,
+      "learning_rate": 4.414027149321267e-05,
+      "loss": 0.4452,
+      "step": 360
+    },
+    {
+      "epoch": 0.8008658008658008,
+      "grad_norm": 1.645504355430603,
+      "learning_rate": 4.391402714932127e-05,
+      "loss": 0.4385,
+      "step": 370
+    },
+    {
+      "epoch": 0.8225108225108225,
+      "grad_norm": 1.7644546031951904,
+      "learning_rate": 4.368778280542987e-05,
+      "loss": 0.4514,
+      "step": 380
+    },
+    {
+      "epoch": 0.8441558441558441,
+      "grad_norm": 1.7779700756072998,
+      "learning_rate": 4.346153846153846e-05,
+      "loss": 0.4785,
+      "step": 390
+    },
+    {
+      "epoch": 0.8658008658008658,
+      "grad_norm": 1.3203119039535522,
+      "learning_rate": 4.323529411764706e-05,
+      "loss": 0.509,
+      "step": 400
+    },
+    {
+      "epoch": 0.8658008658008658,
+      "eval_loss": 0.11263184249401093,
+      "eval_runtime": 11.019,
+      "eval_samples_per_second": 45.376,
+      "eval_steps_per_second": 0.726,
+      "step": 400
+    },
+    {
+      "epoch": 0.8874458874458875,
+      "grad_norm": 1.3130112886428833,
+      "learning_rate": 4.300904977375566e-05,
+      "loss": 0.4583,
+      "step": 410
+    },
+    {
+      "epoch": 0.9090909090909091,
+      "grad_norm": 1.0591143369674683,
+      "learning_rate": 4.2782805429864254e-05,
+      "loss": 0.4318,
+      "step": 420
+    },
+    {
+      "epoch": 0.9307359307359307,
+      "grad_norm": 1.389012336730957,
+      "learning_rate": 4.255656108597285e-05,
+      "loss": 0.4554,
+      "step": 430
+    },
+    {
+      "epoch": 0.9523809523809523,
+      "grad_norm": 1.3405207395553589,
+      "learning_rate": 4.233031674208145e-05,
+      "loss": 0.4485,
+      "step": 440
+    },
+    {
+      "epoch": 0.974025974025974,
+      "grad_norm": 1.2793899774551392,
+      "learning_rate": 4.2104072398190045e-05,
+      "loss": 0.4537,
+      "step": 450
+    },
+    {
+      "epoch": 0.9956709956709957,
+      "grad_norm": 1.6418206691741943,
+      "learning_rate": 4.1877828054298645e-05,
+      "loss": 0.4446,
+      "step": 460
+    },
+    {
+      "epoch": 1.0173160173160174,
+      "grad_norm": 1.7021077871322632,
+      "learning_rate": 4.1651583710407244e-05,
+      "loss": 0.4476,
+      "step": 470
+    },
+    {
+      "epoch": 1.0389610389610389,
+      "grad_norm": 2.451876401901245,
+      "learning_rate": 4.142533936651584e-05,
+      "loss": 0.4221,
+      "step": 480
+    },
+    {
+      "epoch": 1.0606060606060606,
+      "grad_norm": 1.5263983011245728,
+      "learning_rate": 4.1199095022624436e-05,
+      "loss": 0.4121,
+      "step": 490
+    },
+    {
+      "epoch": 1.0822510822510822,
+      "grad_norm": 1.1231575012207031,
+      "learning_rate": 4.0972850678733035e-05,
+      "loss": 0.4399,
+      "step": 500
+    },
+    {
+      "epoch": 1.0822510822510822,
+      "eval_loss": 0.107670359313488,
+      "eval_runtime": 11.2498,
+      "eval_samples_per_second": 44.445,
+      "eval_steps_per_second": 0.711,
+      "step": 500
+    },
+    {
+      "epoch": 1.103896103896104,
+      "grad_norm": 1.3337820768356323,
+      "learning_rate": 4.074660633484163e-05,
+      "loss": 0.3778,
+      "step": 510
+    },
+    {
+      "epoch": 1.1255411255411256,
+      "grad_norm": 1.2338522672653198,
+      "learning_rate": 4.052036199095023e-05,
+      "loss": 0.399,
+      "step": 520
+    },
+    {
+      "epoch": 1.1471861471861473,
+      "grad_norm": 0.9891630411148071,
+      "learning_rate": 4.029411764705883e-05,
+      "loss": 0.4085,
+      "step": 530
+    },
+    {
+      "epoch": 1.1688311688311688,
+      "grad_norm": 2.232311964035034,
+      "learning_rate": 4.0067873303167426e-05,
+      "loss": 0.4006,
+      "step": 540
+    },
+    {
+      "epoch": 1.1904761904761905,
+      "grad_norm": 1.403098225593567,
+      "learning_rate": 3.984162895927602e-05,
+      "loss": 0.425,
+      "step": 550
+    },
+    {
+      "epoch": 1.2121212121212122,
+      "grad_norm": 1.367215633392334,
+      "learning_rate": 3.961538461538462e-05,
+      "loss": 0.3828,
+      "step": 560
+    },
+    {
+      "epoch": 1.2337662337662338,
+      "grad_norm": 1.7988994121551514,
+      "learning_rate": 3.938914027149321e-05,
+      "loss": 0.3941,
+      "step": 570
+    },
+    {
+      "epoch": 1.2554112554112553,
+      "grad_norm": 1.0070048570632935,
+      "learning_rate": 3.916289592760181e-05,
+      "loss": 0.3792,
+      "step": 580
+    },
+    {
+      "epoch": 1.277056277056277,
+      "grad_norm": 1.2809069156646729,
+      "learning_rate": 3.893665158371041e-05,
+      "loss": 0.3711,
+      "step": 590
+    },
+    {
+      "epoch": 1.2987012987012987,
+      "grad_norm": 1.369131326675415,
+      "learning_rate": 3.871040723981901e-05,
+      "loss": 0.4104,
+      "step": 600
+    },
+    {
+      "epoch": 1.2987012987012987,
+      "eval_loss": 0.11119061708450317,
+      "eval_runtime": 11.018,
+      "eval_samples_per_second": 45.38,
+      "eval_steps_per_second": 0.726,
+      "step": 600
+    },
+    {
+      "epoch": 1.3203463203463204,
+      "grad_norm": 1.2250093221664429,
+      "learning_rate": 3.848416289592761e-05,
+      "loss": 0.3638,
+      "step": 610
+    },
+    {
+      "epoch": 1.341991341991342,
+      "grad_norm": 1.4213260412216187,
+      "learning_rate": 3.82579185520362e-05,
+      "loss": 0.408,
+      "step": 620
+    },
+    {
+      "epoch": 1.3636363636363638,
+      "grad_norm": 1.2376580238342285,
+      "learning_rate": 3.8031674208144794e-05,
+      "loss": 0.3746,
+      "step": 630
+    },
+    {
+      "epoch": 1.3852813852813852,
+      "grad_norm": 1.5489083528518677,
+      "learning_rate": 3.780542986425339e-05,
+      "loss": 0.4002,
+      "step": 640
+    },
+    {
+      "epoch": 1.406926406926407,
+      "grad_norm": 1.3500491380691528,
+      "learning_rate": 3.757918552036199e-05,
+      "loss": 0.4152,
+      "step": 650
+    },
+    {
+      "epoch": 1.4285714285714286,
+      "grad_norm": 1.2030531167984009,
+      "learning_rate": 3.735294117647059e-05,
+      "loss": 0.3842,
+      "step": 660
+    },
+    {
+      "epoch": 1.4502164502164503,
+      "grad_norm": 1.205944299697876,
+      "learning_rate": 3.712669683257919e-05,
+      "loss": 0.3756,
+      "step": 670
+    },
+    {
+      "epoch": 1.4718614718614718,
+      "grad_norm": 1.2800495624542236,
+      "learning_rate": 3.6900452488687784e-05,
+      "loss": 0.362,
+      "step": 680
+    },
+    {
+      "epoch": 1.4935064935064934,
+      "grad_norm": 1.3327527046203613,
+      "learning_rate": 3.6674208144796376e-05,
+      "loss": 0.3892,
+      "step": 690
+    },
+    {
+      "epoch": 1.5151515151515151,
+      "grad_norm": 1.760610818862915,
+      "learning_rate": 3.6447963800904976e-05,
+      "loss": 0.4004,
+      "step": 700
+    },
+    {
+      "epoch": 1.5151515151515151,
+      "eval_loss": 0.10533424466848373,
+      "eval_runtime": 10.9974,
+      "eval_samples_per_second": 45.465,
+      "eval_steps_per_second": 0.727,
+      "step": 700
+    },
+    {
+      "epoch": 1.5367965367965368,
+      "grad_norm": 1.2114096879959106,
+      "learning_rate": 3.6221719457013575e-05,
+      "loss": 0.3773,
+      "step": 710
+    },
+    {
+      "epoch": 1.5584415584415585,
+      "grad_norm": 1.234147071838379,
+      "learning_rate": 3.5995475113122175e-05,
+      "loss": 0.3654,
+      "step": 720
+    },
+    {
+      "epoch": 1.5800865800865802,
+      "grad_norm": 1.1402283906936646,
+      "learning_rate": 3.5769230769230774e-05,
+      "loss": 0.3956,
+      "step": 730
+    },
+    {
+      "epoch": 1.601731601731602,
+      "grad_norm": 1.3863605260849,
+      "learning_rate": 3.5542986425339367e-05,
+      "loss": 0.3847,
+      "step": 740
+    },
+    {
+      "epoch": 1.6233766233766234,
+      "grad_norm": 1.1907293796539307,
+      "learning_rate": 3.5316742081447966e-05,
+      "loss": 0.3864,
+      "step": 750
+    },
+    {
+      "epoch": 1.645021645021645,
+      "grad_norm": 1.4367133378982544,
+      "learning_rate": 3.509049773755656e-05,
+      "loss": 0.3764,
+      "step": 760
+    },
+    {
+      "epoch": 1.6666666666666665,
+      "grad_norm": 5.571836948394775,
+      "learning_rate": 3.486425339366516e-05,
+      "loss": 0.3986,
+      "step": 770
+    },
+    {
+      "epoch": 1.6883116883116882,
+      "grad_norm": 1.7412991523742676,
+      "learning_rate": 3.463800904977376e-05,
+      "loss": 0.3659,
+      "step": 780
+    },
+    {
+      "epoch": 1.70995670995671,
+      "grad_norm": 1.2746937274932861,
+      "learning_rate": 3.441176470588236e-05,
+      "loss": 0.405,
+      "step": 790
+    },
+    {
+      "epoch": 1.7316017316017316,
+      "grad_norm": 1.7823095321655273,
+      "learning_rate": 3.418552036199095e-05,
+      "loss": 0.3756,
+      "step": 800
+    },
+    {
+      "epoch": 1.7316017316017316,
+      "eval_loss": 0.10027167201042175,
+      "eval_runtime": 11.0379,
+      "eval_samples_per_second": 45.299,
+      "eval_steps_per_second": 0.725,
+      "step": 800
+    },
+    {
+      "epoch": 1.7532467532467533,
+      "grad_norm": 1.605469822883606,
+      "learning_rate": 3.395927601809955e-05,
+      "loss": 0.3658,
+      "step": 810
+    },
+    {
+      "epoch": 1.774891774891775,
+      "grad_norm": 1.058374285697937,
+      "learning_rate": 3.373303167420815e-05,
+      "loss": 0.3687,
+      "step": 820
+    },
+    {
+      "epoch": 1.7965367965367967,
+      "grad_norm": 1.0704131126403809,
+      "learning_rate": 3.350678733031674e-05,
+      "loss": 0.3884,
+      "step": 830
+    },
+    {
+      "epoch": 1.8181818181818183,
+      "grad_norm": 1.3040030002593994,
+      "learning_rate": 3.328054298642534e-05,
+      "loss": 0.415,
+      "step": 840
+    },
+    {
+      "epoch": 1.8398268398268398,
+      "grad_norm": 1.4915244579315186,
+      "learning_rate": 3.305429864253394e-05,
+      "loss": 0.3376,
+      "step": 850
+    },
+    {
+      "epoch": 1.8614718614718615,
+      "grad_norm": 1.2545368671417236,
+      "learning_rate": 3.282805429864253e-05,
+      "loss": 0.3834,
+      "step": 860
+    },
+    {
+      "epoch": 1.883116883116883,
+      "grad_norm": 1.5177216529846191,
+      "learning_rate": 3.260180995475113e-05,
+      "loss": 0.3908,
+      "step": 870
+    },
+    {
+      "epoch": 1.9047619047619047,
+      "grad_norm": 2.2402610778808594,
+      "learning_rate": 3.237556561085973e-05,
+      "loss": 0.3801,
+      "step": 880
+    },
+    {
+      "epoch": 1.9264069264069263,
+      "grad_norm": 1.0384106636047363,
+      "learning_rate": 3.214932126696833e-05,
+      "loss": 0.3801,
+      "step": 890
+    },
+    {
+      "epoch": 1.948051948051948,
+      "grad_norm": 1.3252601623535156,
+      "learning_rate": 3.192307692307692e-05,
+      "loss": 0.3787,
+      "step": 900
+    },
+    {
+      "epoch": 1.948051948051948,
+      "eval_loss": 0.09781702607870102,
+      "eval_runtime": 11.1554,
+      "eval_samples_per_second": 44.821,
+      "eval_steps_per_second": 0.717,
+      "step": 900
+    },
+    {
+      "epoch": 1.9696969696969697,
+      "grad_norm": 1.3580507040023804,
+      "learning_rate": 3.169683257918552e-05,
+      "loss": 0.3583,
+      "step": 910
+    },
+    {
+      "epoch": 1.9913419913419914,
+      "grad_norm": 1.4314372539520264,
+      "learning_rate": 3.147058823529412e-05,
+      "loss": 0.3653,
+      "step": 920
+    },
+    {
+      "epoch": 2.012987012987013,
+      "grad_norm": 1.2172566652297974,
+      "learning_rate": 3.1244343891402714e-05,
+      "loss": 0.3238,
+      "step": 930
+    },
+    {
+      "epoch": 2.034632034632035,
+      "grad_norm": 1.4818902015686035,
+      "learning_rate": 3.1018099547511314e-05,
+      "loss": 0.3436,
+      "step": 940
+    },
+    {
+      "epoch": 2.0562770562770565,
+      "grad_norm": 1.1465860605239868,
+      "learning_rate": 3.079185520361991e-05,
+      "loss": 0.3425,
+      "step": 950
+    },
+    {
+      "epoch": 2.0779220779220777,
+      "grad_norm": 1.903261423110962,
+      "learning_rate": 3.056561085972851e-05,
+      "loss": 0.3647,
+      "step": 960
+    },
+    {
+      "epoch": 2.0995670995670994,
+      "grad_norm": 1.3892687559127808,
+      "learning_rate": 3.03393665158371e-05,
+      "loss": 0.3223,
+      "step": 970
+    },
+    {
+      "epoch": 2.121212121212121,
+      "grad_norm": 1.5258572101593018,
+      "learning_rate": 3.01131221719457e-05,
+      "loss": 0.3668,
+      "step": 980
+    },
+    {
+      "epoch": 2.142857142857143,
+      "grad_norm": 1.298425316810608,
+      "learning_rate": 2.98868778280543e-05,
+      "loss": 0.3487,
+      "step": 990
+    },
+    {
+      "epoch": 2.1645021645021645,
+      "grad_norm": 1.8322169780731201,
+      "learning_rate": 2.9660633484162896e-05,
+      "loss": 0.3203,
+      "step": 1000
+    },
+    {
+      "epoch": 2.1645021645021645,
+      "eval_loss": 0.10291534662246704,
+      "eval_runtime": 11.0546,
+      "eval_samples_per_second": 45.23,
+      "eval_steps_per_second": 0.724,
+      "step": 1000
+    },
+    {
+      "epoch": 2.186147186147186,
+      "grad_norm": 1.371580958366394,
+      "learning_rate": 2.9434389140271496e-05,
+      "loss": 0.3089,
+      "step": 1010
+    },
+    {
+      "epoch": 2.207792207792208,
+      "grad_norm": 1.2570271492004395,
+      "learning_rate": 2.9208144796380095e-05,
+      "loss": 0.3258,
+      "step": 1020
+    },
+    {
+      "epoch": 2.2294372294372296,
+      "grad_norm": 1.7881371974945068,
+      "learning_rate": 2.898190045248869e-05,
+      "loss": 0.3281,
+      "step": 1030
+    },
+    {
+      "epoch": 2.2510822510822512,
+      "grad_norm": 1.2721039056777954,
+      "learning_rate": 2.8755656108597284e-05,
+      "loss": 0.3085,
+      "step": 1040
+    },
+    {
+      "epoch": 2.2727272727272725,
+      "grad_norm": 1.4111952781677246,
+      "learning_rate": 2.8529411764705883e-05,
+      "loss": 0.3368,
+      "step": 1050
+    },
+    {
+      "epoch": 2.2943722943722946,
+      "grad_norm": 1.2382339239120483,
+      "learning_rate": 2.830316742081448e-05,
+      "loss": 0.3331,
+      "step": 1060
+    },
+    {
+      "epoch": 2.316017316017316,
+      "grad_norm": 1.390204906463623,
+      "learning_rate": 2.807692307692308e-05,
+      "loss": 0.3604,
+      "step": 1070
+    },
+    {
+      "epoch": 2.3376623376623376,
+      "grad_norm": 1.046399712562561,
+      "learning_rate": 2.7850678733031678e-05,
+      "loss": 0.3264,
+      "step": 1080
+    },
+    {
+      "epoch": 2.3593073593073592,
+      "grad_norm": 1.4382643699645996,
+      "learning_rate": 2.7624434389140274e-05,
+      "loss": 0.3108,
+      "step": 1090
+    },
+    {
+      "epoch": 2.380952380952381,
+      "grad_norm": 2.0286881923675537,
+      "learning_rate": 2.7398190045248873e-05,
+      "loss": 0.3222,
+      "step": 1100
+    },
+    {
+      "epoch": 2.380952380952381,
+      "eval_loss": 0.09695376455783844,
+      "eval_runtime": 11.0788,
+      "eval_samples_per_second": 45.131,
+      "eval_steps_per_second": 0.722,
+      "step": 1100
+    },
+    {
+      "epoch": 2.4025974025974026,
+      "grad_norm": 1.735385537147522,
+      "learning_rate": 2.7171945701357466e-05,
+      "loss": 0.3141,
+      "step": 1110
+    },
+    {
+      "epoch": 2.4242424242424243,
+      "grad_norm": 1.3764822483062744,
+      "learning_rate": 2.6945701357466062e-05,
+      "loss": 0.3398,
+      "step": 1120
+    },
+    {
+      "epoch": 2.445887445887446,
+      "grad_norm": 1.189744234085083,
+      "learning_rate": 2.671945701357466e-05,
+      "loss": 0.3216,
+      "step": 1130
+    },
+    {
+      "epoch": 2.4675324675324677,
+      "grad_norm": 1.2942255735397339,
+      "learning_rate": 2.649321266968326e-05,
+      "loss": 0.3345,
+      "step": 1140
+    },
+    {
+      "epoch": 2.4891774891774894,
+      "grad_norm": 1.3467974662780762,
+      "learning_rate": 2.6266968325791857e-05,
+      "loss": 0.3554,
+      "step": 1150
+    },
+    {
+      "epoch": 2.5108225108225106,
+      "grad_norm": 1.4570668935775757,
+      "learning_rate": 2.6040723981900456e-05,
+      "loss": 0.328,
+      "step": 1160
+    },
+    {
+      "epoch": 2.5324675324675323,
+      "grad_norm": 1.5922597646713257,
+      "learning_rate": 2.5814479638009052e-05,
+      "loss": 0.3139,
+      "step": 1170
+    },
+    {
+      "epoch": 2.554112554112554,
+      "grad_norm": 1.2003611326217651,
+      "learning_rate": 2.5588235294117645e-05,
+      "loss": 0.2989,
+      "step": 1180
+    },
+    {
+      "epoch": 2.5757575757575757,
+      "grad_norm": 2.1363961696624756,
+      "learning_rate": 2.5361990950226244e-05,
+      "loss": 0.3263,
+      "step": 1190
+    },
+    {
+      "epoch": 2.5974025974025974,
+      "grad_norm": 1.5996774435043335,
+      "learning_rate": 2.5135746606334844e-05,
+      "loss": 0.3578,
+      "step": 1200
+    },
+    {
+      "epoch": 2.5974025974025974,
+      "eval_loss": 0.09967260807752609,
+      "eval_runtime": 11.07,
+      "eval_samples_per_second": 45.167,
+      "eval_steps_per_second": 0.723,
+      "step": 1200
+    },
+    {
+      "epoch": 2.619047619047619,
+      "grad_norm": 1.1521668434143066,
+      "learning_rate": 2.490950226244344e-05,
+      "loss": 0.33,
+      "step": 1210
+    },
+    {
+      "epoch": 2.6406926406926408,
+      "grad_norm": 1.1272907257080078,
+      "learning_rate": 2.468325791855204e-05,
+      "loss": 0.3119,
+      "step": 1220
+    },
+    {
+      "epoch": 2.6623376623376624,
+      "grad_norm": 1.270735263824463,
+      "learning_rate": 2.4457013574660635e-05,
+      "loss": 0.2898,
+      "step": 1230
+    },
+    {
+      "epoch": 2.683982683982684,
+      "grad_norm": 1.0944297313690186,
+      "learning_rate": 2.423076923076923e-05,
+      "loss": 0.3235,
+      "step": 1240
+    },
+    {
+      "epoch": 2.7056277056277054,
+      "grad_norm": 1.8073984384536743,
+      "learning_rate": 2.400452488687783e-05,
+      "loss": 0.3529,
+      "step": 1250
+    },
+    {
+      "epoch": 2.7272727272727275,
+      "grad_norm": 1.573345422744751,
+      "learning_rate": 2.3778280542986426e-05,
+      "loss": 0.3374,
+      "step": 1260
+    },
+    {
+      "epoch": 2.7489177489177488,
+      "grad_norm": 1.2758445739746094,
+      "learning_rate": 2.3552036199095022e-05,
+      "loss": 0.3111,
+      "step": 1270
+    },
+    {
+      "epoch": 2.7705627705627704,
+      "grad_norm": 1.4875160455703735,
+      "learning_rate": 2.3325791855203622e-05,
+      "loss": 0.3152,
+      "step": 1280
+    },
+    {
+      "epoch": 2.792207792207792,
+      "grad_norm": 1.3437527418136597,
+      "learning_rate": 2.309954751131222e-05,
+      "loss": 0.3366,
+      "step": 1290
+    },
+    {
+      "epoch": 2.813852813852814,
+      "grad_norm": 0.9686049818992615,
+      "learning_rate": 2.2873303167420814e-05,
+      "loss": 0.3073,
+      "step": 1300
+    },
+    {
+      "epoch": 2.813852813852814,
+      "eval_loss": 0.11263589560985565,
+      "eval_runtime": 11.0755,
+      "eval_samples_per_second": 45.145,
+      "eval_steps_per_second": 0.722,
+      "step": 1300
+    },
+    {
+      "epoch": 2.8354978354978355,
+      "grad_norm": 1.654786467552185,
+      "learning_rate": 2.2647058823529413e-05,
+      "loss": 0.3259,
+      "step": 1310
+    },
+    {
+      "epoch": 2.857142857142857,
+      "grad_norm": 1.2647713422775269,
+      "learning_rate": 2.2420814479638013e-05,
+      "loss": 0.3261,
+      "step": 1320
+    },
+    {
+      "epoch": 2.878787878787879,
+      "grad_norm": 1.293813943862915,
+      "learning_rate": 2.2194570135746605e-05,
+      "loss": 0.3361,
+      "step": 1330
+    },
+    {
+      "epoch": 2.9004329004329006,
+      "grad_norm": 1.320722222328186,
+      "learning_rate": 2.1968325791855205e-05,
+      "loss": 0.3197,
+      "step": 1340
+    },
+    {
+      "epoch": 2.9220779220779223,
+      "grad_norm": 1.2165260314941406,
+      "learning_rate": 2.1742081447963804e-05,
+      "loss": 0.3408,
+      "step": 1350
+    },
+    {
+      "epoch": 2.9437229437229435,
+      "grad_norm": 1.154363989830017,
+      "learning_rate": 2.15158371040724e-05,
+      "loss": 0.3594,
+      "step": 1360
+    },
+    {
+      "epoch": 2.965367965367965,
+      "grad_norm": 1.7927467823028564,
+      "learning_rate": 2.1289592760180996e-05,
+      "loss": 0.3399,
+      "step": 1370
+    },
+    {
+      "epoch": 2.987012987012987,
+      "grad_norm": 2.399272918701172,
+      "learning_rate": 2.1063348416289595e-05,
+      "loss": 0.3276,
+      "step": 1380
+    },
+    {
+      "epoch": 3.0086580086580086,
+      "grad_norm": 1.5982075929641724,
+      "learning_rate": 2.083710407239819e-05,
+      "loss": 0.2852,
+      "step": 1390
+    },
+    {
+      "epoch": 3.0303030303030303,
+      "grad_norm": 1.479366660118103,
+      "learning_rate": 2.0610859728506787e-05,
+      "loss": 0.2805,
+      "step": 1400
+    },
+    {
+      "epoch": 3.0303030303030303,
+      "eval_loss": 0.09813899546861649,
+      "eval_runtime": 11.2539,
+      "eval_samples_per_second": 44.429,
+      "eval_steps_per_second": 0.711,
+      "step": 1400
+    },
+    {
+      "epoch": 3.051948051948052,
+      "grad_norm": 1.197197437286377,
+      "learning_rate": 2.0384615384615387e-05,
+      "loss": 0.2819,
+      "step": 1410
+    },
+    {
+      "epoch": 3.0735930735930737,
+      "grad_norm": 1.2043474912643433,
+      "learning_rate": 2.0158371040723983e-05,
+      "loss": 0.2739,
+      "step": 1420
+    },
+    {
+      "epoch": 3.0952380952380953,
+      "grad_norm": 1.305875301361084,
+      "learning_rate": 1.9932126696832582e-05,
+      "loss": 0.2823,
+      "step": 1430
+    },
+    {
+      "epoch": 3.116883116883117,
+      "grad_norm": 1.8753947019577026,
+      "learning_rate": 1.9705882352941178e-05,
+      "loss": 0.3127,
+      "step": 1440
+    },
+    {
+      "epoch": 3.1385281385281387,
+      "grad_norm": 1.1929014921188354,
+      "learning_rate": 1.9479638009049774e-05,
+      "loss": 0.2834,
+      "step": 1450
+    },
+    {
+      "epoch": 3.16017316017316,
+      "grad_norm": 1.4350407123565674,
+      "learning_rate": 1.9253393665158374e-05,
+      "loss": 0.2847,
+      "step": 1460
+    },
+    {
+      "epoch": 3.1818181818181817,
+      "grad_norm": 1.4549144506454468,
+      "learning_rate": 1.902714932126697e-05,
+      "loss": 0.2891,
+      "step": 1470
+    },
+    {
+      "epoch": 3.2034632034632033,
+      "grad_norm": 1.4221315383911133,
+      "learning_rate": 1.8800904977375566e-05,
+      "loss": 0.2947,
+      "step": 1480
+    },
+    {
+      "epoch": 3.225108225108225,
+      "grad_norm": 1.5852824449539185,
+      "learning_rate": 1.8574660633484165e-05,
+      "loss": 0.2988,
+      "step": 1490
+    },
+    {
+      "epoch": 3.2467532467532467,
+      "grad_norm": 2.2345638275146484,
+      "learning_rate": 1.834841628959276e-05,
+      "loss": 0.3131,
+      "step": 1500
+    },
+    {
+      "epoch": 3.2467532467532467,
+      "eval_loss": 0.10453030467033386,
+      "eval_runtime": 11.0226,
+      "eval_samples_per_second": 45.361,
+      "eval_steps_per_second": 0.726,
+      "step": 1500
+    },
+    {
+      "epoch": 3.2683982683982684,
+      "grad_norm": 1.0346052646636963,
+      "learning_rate": 1.8122171945701357e-05,
+      "loss": 0.2779,
+      "step": 1510
+    },
+    {
+      "epoch": 3.29004329004329,
+      "grad_norm": 1.4983434677124023,
+      "learning_rate": 1.7895927601809956e-05,
+      "loss": 0.3022,
+      "step": 1520
+    },
+    {
+      "epoch": 3.311688311688312,
+      "grad_norm": 1.3768881559371948,
+      "learning_rate": 1.7669683257918552e-05,
+      "loss": 0.2966,
+      "step": 1530
+    },
+    {
+      "epoch": 3.3333333333333335,
+      "grad_norm": 2.331627130508423,
+      "learning_rate": 1.744343891402715e-05,
+      "loss": 0.2875,
+      "step": 1540
+    },
+    {
+      "epoch": 3.354978354978355,
+      "grad_norm": 1.2236874103546143,
+      "learning_rate": 1.7217194570135748e-05,
+      "loss": 0.277,
+      "step": 1550
+    },
+    {
+      "epoch": 3.3766233766233764,
+      "grad_norm": 1.3123564720153809,
+      "learning_rate": 1.6990950226244347e-05,
+      "loss": 0.2849,
+      "step": 1560
+    },
+    {
+      "epoch": 3.398268398268398,
+      "grad_norm": 1.8533962965011597,
+      "learning_rate": 1.676470588235294e-05,
+      "loss": 0.2907,
+      "step": 1570
+    },
+    {
+      "epoch": 3.41991341991342,
+      "grad_norm": 1.4100937843322754,
+      "learning_rate": 1.653846153846154e-05,
+      "loss": 0.2879,
+      "step": 1580
+    },
+    {
+      "epoch": 3.4415584415584415,
+      "grad_norm": 1.474001169204712,
+      "learning_rate": 1.631221719457014e-05,
+      "loss": 0.2824,
+      "step": 1590
+    },
+    {
+      "epoch": 3.463203463203463,
+      "grad_norm": 1.746888518333435,
+      "learning_rate": 1.6085972850678734e-05,
+      "loss": 0.2926,
+      "step": 1600
+    },
+    {
+      "epoch": 3.463203463203463,
+      "eval_loss": 0.10039258748292923,
+      "eval_runtime": 11.0205,
+      "eval_samples_per_second": 45.37,
+      "eval_steps_per_second": 0.726,
+      "step": 1600
+    },
+    {
+      "epoch": 3.484848484848485,
+      "grad_norm": 1.3926804065704346,
+      "learning_rate": 1.585972850678733e-05,
+      "loss": 0.2806,
+      "step": 1610
+    },
+    {
+      "epoch": 3.5064935064935066,
+      "grad_norm": 1.2975285053253174,
+      "learning_rate": 1.563348416289593e-05,
+      "loss": 0.2834,
+      "step": 1620
+    },
+    {
+      "epoch": 3.5281385281385282,
+      "grad_norm": 1.4987730979919434,
+      "learning_rate": 1.5407239819004526e-05,
+      "loss": 0.2877,
+      "step": 1630
+    },
+    {
+      "epoch": 3.54978354978355,
+      "grad_norm": 1.3945181369781494,
+      "learning_rate": 1.5180995475113122e-05,
+      "loss": 0.2571,
+      "step": 1640
+    },
+    {
+      "epoch": 3.571428571428571,
+      "grad_norm": 1.3280750513076782,
+      "learning_rate": 1.495475113122172e-05,
+      "loss": 0.2687,
+      "step": 1650
+    },
+    {
+      "epoch": 3.5930735930735933,
+      "grad_norm": 1.7335435152053833,
+      "learning_rate": 1.4728506787330317e-05,
+      "loss": 0.3212,
+      "step": 1660
+    },
+    {
+      "epoch": 3.6147186147186146,
+      "grad_norm": 1.2388490438461304,
+      "learning_rate": 1.4502262443438917e-05,
+      "loss": 0.2858,
+      "step": 1670
+    },
+    {
+      "epoch": 3.6363636363636362,
+      "grad_norm": 2.562535524368286,
+      "learning_rate": 1.4276018099547511e-05,
+      "loss": 0.2742,
+      "step": 1680
+    },
+    {
+      "epoch": 3.658008658008658,
+      "grad_norm": 1.321783185005188,
+      "learning_rate": 1.4049773755656109e-05,
+      "loss": 0.304,
+      "step": 1690
+    },
+    {
+      "epoch": 3.6796536796536796,
+      "grad_norm": 1.445235013961792,
+      "learning_rate": 1.3823529411764708e-05,
+      "loss": 0.3193,
+      "step": 1700
+    },
+    {
+      "epoch": 3.6796536796536796,
+      "eval_loss": 0.1012435331940651,
+      "eval_runtime": 11.0758,
+      "eval_samples_per_second": 45.144,
+      "eval_steps_per_second": 0.722,
+      "step": 1700
+    },
+    {
+      "epoch": 3.7012987012987013,
+      "grad_norm": 1.4029107093811035,
+      "learning_rate": 1.3597285067873302e-05,
+      "loss": 0.2957,
+      "step": 1710
+    },
+    {
+      "epoch": 3.722943722943723,
+      "grad_norm": 1.5768345594406128,
+      "learning_rate": 1.33710407239819e-05,
+      "loss": 0.2886,
+      "step": 1720
+    },
+    {
+      "epoch": 3.7445887445887447,
+      "grad_norm": 1.216724157333374,
+      "learning_rate": 1.31447963800905e-05,
+      "loss": 0.2843,
+      "step": 1730
+    },
+    {
+      "epoch": 3.7662337662337664,
+      "grad_norm": 1.4409873485565186,
+      "learning_rate": 1.2918552036199097e-05,
+      "loss": 0.2827,
+      "step": 1740
+    },
+    {
+      "epoch": 3.787878787878788,
+      "grad_norm": 1.0664066076278687,
+      "learning_rate": 1.2692307692307691e-05,
+      "loss": 0.2821,
+      "step": 1750
+    },
+    {
+      "epoch": 3.8095238095238093,
+      "grad_norm": 1.540581464767456,
+      "learning_rate": 1.246606334841629e-05,
+      "loss": 0.2689,
+      "step": 1760
+    },
+    {
+      "epoch": 3.8311688311688314,
+      "grad_norm": 1.1569033861160278,
+      "learning_rate": 1.2239819004524887e-05,
+      "loss": 0.2708,
+      "step": 1770
+    },
+    {
+      "epoch": 3.8528138528138527,
+      "grad_norm": 1.833857536315918,
+      "learning_rate": 1.2013574660633485e-05,
+      "loss": 0.3016,
+      "step": 1780
+    },
+    {
+      "epoch": 3.8744588744588744,
+      "grad_norm": 2.1259467601776123,
+      "learning_rate": 1.1787330316742082e-05,
+      "loss": 0.301,
+      "step": 1790
+    },
+    {
+      "epoch": 3.896103896103896,
+      "grad_norm": 1.0809746980667114,
+      "learning_rate": 1.156108597285068e-05,
+      "loss": 0.2567,
+      "step": 1800
+    },
+    {
+      "epoch": 3.896103896103896,
+      "eval_loss": 0.09367834776639938,
+      "eval_runtime": 11.1921,
+      "eval_samples_per_second": 44.674,
+      "eval_steps_per_second": 0.715,
+      "step": 1800
+    },
+    {
+      "epoch": 3.9177489177489178,
+      "grad_norm": 2.1658575534820557,
+      "learning_rate": 1.1334841628959276e-05,
+      "loss": 0.3016,
+      "step": 1810
+    },
+    {
+      "epoch": 3.9393939393939394,
+      "grad_norm": 1.4649549722671509,
+      "learning_rate": 1.1108597285067874e-05,
+      "loss": 0.2925,
+      "step": 1820
+    },
+    {
+      "epoch": 3.961038961038961,
+      "grad_norm": 1.6620761156082153,
+      "learning_rate": 1.0882352941176471e-05,
+      "loss": 0.2803,
+      "step": 1830
+    },
+    {
+      "epoch": 3.982683982683983,
+      "grad_norm": 1.129351258277893,
+      "learning_rate": 1.0656108597285067e-05,
+      "loss": 0.2821,
+      "step": 1840
+    },
+    {
+      "epoch": 4.004329004329004,
+      "grad_norm": 1.558613896369934,
+      "learning_rate": 1.0429864253393667e-05,
+      "loss": 0.2892,
+      "step": 1850
+    },
+    {
+      "epoch": 4.025974025974026,
+      "grad_norm": 1.2767349481582642,
+      "learning_rate": 1.0203619909502263e-05,
+      "loss": 0.2511,
+      "step": 1860
+    },
+    {
+      "epoch": 4.0476190476190474,
+      "grad_norm": 1.3410160541534424,
+      "learning_rate": 9.97737556561086e-06,
+      "loss": 0.2652,
+      "step": 1870
+    },
+    {
+      "epoch": 4.06926406926407,
+      "grad_norm": 1.3820221424102783,
+      "learning_rate": 9.751131221719458e-06,
+      "loss": 0.2781,
+      "step": 1880
+    },
+    {
+      "epoch": 4.090909090909091,
+      "grad_norm": 1.479778528213501,
+      "learning_rate": 9.524886877828054e-06,
+      "loss": 0.248,
+      "step": 1890
+    },
+    {
+      "epoch": 4.112554112554113,
+      "grad_norm": 1.3408429622650146,
+      "learning_rate": 9.298642533936652e-06,
+      "loss": 0.2416,
+      "step": 1900
+    },
+    {
+      "epoch": 4.112554112554113,
+      "eval_loss": 0.10034486651420593,
+      "eval_runtime": 11.0609,
+      "eval_samples_per_second": 45.204,
+      "eval_steps_per_second": 0.723,
+      "step": 1900
+    },
+    {
+      "epoch": 4.134199134199134,
+      "grad_norm": 1.6059556007385254,
+      "learning_rate": 9.07239819004525e-06,
+      "loss": 0.25,
+      "step": 1910
+    },
+    {
+      "epoch": 4.1558441558441555,
+      "grad_norm": 1.424112319946289,
+      "learning_rate": 8.846153846153847e-06,
+      "loss": 0.2625,
+      "step": 1920
+    },
+    {
+      "epoch": 4.177489177489178,
+      "grad_norm": 1.638130784034729,
+      "learning_rate": 8.619909502262443e-06,
+      "loss": 0.2623,
+      "step": 1930
+    },
+    {
+      "epoch": 4.199134199134199,
+      "grad_norm": 1.5797659158706665,
+      "learning_rate": 8.393665158371041e-06,
+      "loss": 0.2646,
+      "step": 1940
+    },
+    {
+      "epoch": 4.220779220779221,
+      "grad_norm": 1.5362834930419922,
+      "learning_rate": 8.167420814479639e-06,
+      "loss": 0.2544,
+      "step": 1950
+    },
+    {
+      "epoch": 4.242424242424242,
+      "grad_norm": 1.6381897926330566,
+      "learning_rate": 7.941176470588235e-06,
+      "loss": 0.255,
+      "step": 1960
+    },
+    {
+      "epoch": 4.264069264069264,
+      "grad_norm": 1.8974134922027588,
+      "learning_rate": 7.714932126696834e-06,
+      "loss": 0.2762,
+      "step": 1970
+    },
+    {
+      "epoch": 4.285714285714286,
+      "grad_norm": 1.830533504486084,
+      "learning_rate": 7.48868778280543e-06,
+      "loss": 0.2733,
+      "step": 1980
+    },
+    {
+      "epoch": 4.307359307359308,
+      "grad_norm": 1.5712575912475586,
+      "learning_rate": 7.262443438914028e-06,
+      "loss": 0.256,
+      "step": 1990
+    },
+    {
+      "epoch": 4.329004329004329,
+      "grad_norm": 1.6964036226272583,
+      "learning_rate": 7.0361990950226245e-06,
+      "loss": 0.2512,
+      "step": 2000
+    },
+    {
+      "epoch": 4.329004329004329,
+      "eval_loss": 0.09686653316020966,
+      "eval_runtime": 11.0207,
+      "eval_samples_per_second": 45.369,
+      "eval_steps_per_second": 0.726,
+      "step": 2000
+    },
+    {
+      "epoch": 4.35064935064935,
+      "grad_norm": 1.8361965417861938,
+      "learning_rate": 6.809954751131221e-06,
+      "loss": 0.243,
+      "step": 2010
+    },
+    {
+      "epoch": 4.372294372294372,
+      "grad_norm": 1.6544777154922485,
+      "learning_rate": 6.58371040723982e-06,
+      "loss": 0.2711,
+      "step": 2020
+    },
+    {
+      "epoch": 4.393939393939394,
+      "grad_norm": 1.1742023229599,
+      "learning_rate": 6.357466063348416e-06,
+      "loss": 0.2406,
+      "step": 2030
+    },
+    {
+      "epoch": 4.415584415584416,
+      "grad_norm": 1.4244996309280396,
+      "learning_rate": 6.131221719457014e-06,
+      "loss": 0.243,
+      "step": 2040
+    },
+    {
+      "epoch": 4.437229437229437,
+      "grad_norm": 1.6553417444229126,
+      "learning_rate": 5.904977375565611e-06,
+      "loss": 0.2433,
+      "step": 2050
+    },
+    {
+      "epoch": 4.458874458874459,
+      "grad_norm": 1.690132737159729,
+      "learning_rate": 5.678733031674208e-06,
+      "loss": 0.2563,
+      "step": 2060
+    },
+    {
+      "epoch": 4.48051948051948,
+      "grad_norm": 1.625807762145996,
+      "learning_rate": 5.452488687782806e-06,
+      "loss": 0.2795,
+      "step": 2070
+    },
+    {
+      "epoch": 4.5021645021645025,
+      "grad_norm": 1.5032762289047241,
+      "learning_rate": 5.226244343891403e-06,
+      "loss": 0.2638,
+      "step": 2080
+    },
+    {
+      "epoch": 4.523809523809524,
+      "grad_norm": 1.5924805402755737,
+      "learning_rate": 5e-06,
+      "loss": 0.2327,
+      "step": 2090
+    },
+    {
+      "epoch": 4.545454545454545,
+      "grad_norm": 1.765118956565857,
+      "learning_rate": 4.773755656108597e-06,
+      "loss": 0.235,
+      "step": 2100
+    },
+    {
+      "epoch": 4.545454545454545,
+      "eval_loss": 0.10021113604307175,
+      "eval_runtime": 11.006,
+      "eval_samples_per_second": 45.43,
+      "eval_steps_per_second": 0.727,
+      "step": 2100
+    },
+    {
+      "epoch": 4.567099567099567,
+      "grad_norm": 1.4232510328292847,
+      "learning_rate": 4.547511312217195e-06,
+      "loss": 0.2444,
+      "step": 2110
+    },
+    {
+      "epoch": 4.588744588744589,
+      "grad_norm": 1.6832308769226074,
+      "learning_rate": 4.321266968325792e-06,
+      "loss": 0.282,
+      "step": 2120
+    },
+    {
+      "epoch": 4.6103896103896105,
+      "grad_norm": 1.4508180618286133,
+      "learning_rate": 4.0950226244343895e-06,
+      "loss": 0.2541,
+      "step": 2130
+    },
+    {
+      "epoch": 4.632034632034632,
+      "grad_norm": 1.5618845224380493,
+      "learning_rate": 3.868778280542986e-06,
+      "loss": 0.2344,
+      "step": 2140
+    },
+    {
+      "epoch": 4.653679653679654,
+      "grad_norm": 1.700337529182434,
+      "learning_rate": 3.642533936651584e-06,
+      "loss": 0.2605,
+      "step": 2150
+    },
+    {
+      "epoch": 4.675324675324675,
+      "grad_norm": 1.8766915798187256,
+      "learning_rate": 3.416289592760181e-06,
+      "loss": 0.2657,
+      "step": 2160
+    },
+    {
+      "epoch": 4.696969696969697,
+      "grad_norm": 1.39222252368927,
+      "learning_rate": 3.190045248868778e-06,
+      "loss": 0.2724,
+      "step": 2170
+    },
+    {
+      "epoch": 4.7186147186147185,
+      "grad_norm": 1.7200310230255127,
+      "learning_rate": 2.9638009049773754e-06,
+      "loss": 0.2473,
+      "step": 2180
+    },
+    {
+      "epoch": 4.740259740259741,
+      "grad_norm": 1.4263789653778076,
+      "learning_rate": 2.737556561085973e-06,
+      "loss": 0.2735,
+      "step": 2190
+    },
+    {
+      "epoch": 4.761904761904762,
+      "grad_norm": 1.4629650115966797,
+      "learning_rate": 2.5113122171945704e-06,
+      "loss": 0.2453,
+      "step": 2200
+    },
+    {
+      "epoch": 4.761904761904762,
+      "eval_loss": 0.09669920057058334,
+      "eval_runtime": 11.2945,
+      "eval_samples_per_second": 44.269,
+      "eval_steps_per_second": 0.708,
+      "step": 2200
+    },
+    {
+      "epoch": 4.783549783549784,
+      "grad_norm": 1.5298959016799927,
+      "learning_rate": 2.2850678733031673e-06,
+      "loss": 0.2527,
+      "step": 2210
+    },
+    {
+      "epoch": 4.805194805194805,
+      "grad_norm": 1.4086050987243652,
+      "learning_rate": 2.058823529411765e-06,
+      "loss": 0.2351,
+      "step": 2220
+    },
+    {
+      "epoch": 4.8268398268398265,
+      "grad_norm": 1.6125555038452148,
+      "learning_rate": 1.8325791855203622e-06,
+      "loss": 0.272,
+      "step": 2230
+    },
+    {
+      "epoch": 4.848484848484849,
+      "grad_norm": 1.69878089427948,
+      "learning_rate": 1.6063348416289593e-06,
+      "loss": 0.2545,
+      "step": 2240
+    },
+    {
+      "epoch": 4.87012987012987,
+      "grad_norm": 1.387439250946045,
+      "learning_rate": 1.3800904977375566e-06,
+      "loss": 0.2617,
+      "step": 2250
+    },
+    {
+      "epoch": 4.891774891774892,
+      "grad_norm": 1.551837682723999,
+      "learning_rate": 1.153846153846154e-06,
+      "loss": 0.2384,
+      "step": 2260
+    },
+    {
+      "epoch": 4.913419913419913,
+      "grad_norm": 1.6093473434448242,
+      "learning_rate": 9.276018099547512e-07,
+      "loss": 0.2494,
+      "step": 2270
+    },
+    {
+      "epoch": 4.935064935064935,
+      "grad_norm": 1.4138660430908203,
+      "learning_rate": 7.013574660633485e-07,
+      "loss": 0.2371,
+      "step": 2280
+    },
+    {
+      "epoch": 4.956709956709957,
+      "grad_norm": 1.7874048948287964,
+      "learning_rate": 4.751131221719457e-07,
+      "loss": 0.2562,
+      "step": 2290
+    },
+    {
+      "epoch": 4.978354978354979,
+      "grad_norm": 1.470076322555542,
+      "learning_rate": 2.4886877828054297e-07,
+      "loss": 0.2483,
+      "step": 2300
+    },
+    {
+      "epoch": 4.978354978354979,
+      "eval_loss": 0.09722720086574554,
+      "eval_runtime": 11.0178,
+      "eval_samples_per_second": 45.381,
+      "eval_steps_per_second": 0.726,
+      "step": 2300
+    },
+    {
+      "epoch": 5.0,
+      "grad_norm": 1.6692209243774414,
+      "learning_rate": 2.2624434389140274e-08,
+      "loss": 0.2477,
+      "step": 2310
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 2310,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 5,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1.5716476802941583e+19,
+  "train_batch_size": 64,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-2310/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a5ee6fb727a5b92509e0b07a43bb3b13192a4cd21f2c87960f9bd90ffb0b22fd
+size 5777

git_hash.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ feffe2baab62fd0855fe1ec3334fcecf3eff9f1a

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "chunk_length": 300,
+  "dither": 0.0,
+  "feature_extractor_type": "WhisperFeatureExtractor",
+  "feature_size": 128,
+  "hop_length": 160,
+  "image_mean": [
+    0.48145466,
+    0.4578275,
+    0.40821073
+  ],
+  "image_processor_type": "Qwen2VLImageProcessor",
+  "image_std": [
+    0.26862954,
+    0.26130258,
+    0.27577711
+  ],
+  "max_pixels": 802816,
+  "merge_size": 2,
+  "min_pixels": 3136,
+  "n_fft": 400,
+  "n_samples": 4800000,
+  "nb_max_frames": 30000,
+  "padding_side": "right",
+  "padding_value": 0.0,
+  "patch_size": 14,
+  "processor_class": "ColQwen2_5OmniProcessor",
+  "return_attention_mask": true,
+  "sampling_rate": 16000,
+  "temporal_patch_size": 2
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,38 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|AUDIO|>",
+    "<|audio_bos|>",
+    "<|audio_eos|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_bos|>",
+    "<|vision_eos|>",
+    "<|vision_pad|>",
+    "<|IMAGE|>",
+    "<|VIDEO|>"
+  ],
+  "audio_bos_token": "<|audio_bos|>",
+  "audio_eos_token": "<|audio_eos|>",
+  "audio_token": "<|AUDIO|>",
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "image_token": "<|IMAGE|>",
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "video_token": "<|VIDEO|>",
+  "vision_bos_token": "<|vision_bos|>",
+  "vision_eos_token": "<|vision_eos|>"
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8441917e39ae0244e06d704b95b3124795cec478e297f9afac39ba670d7e9d99
+size 11421870

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,222 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|AUDIO|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|audio_bos|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|audio_eos|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_bos|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_eos|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|IMAGE|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|VIDEO|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|AUDIO|>",
+    "<|audio_bos|>",
+    "<|audio_eos|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_bos|>",
+    "<|vision_eos|>",
+    "<|vision_pad|>",
+    "<|IMAGE|>",
+    "<|VIDEO|>"
+  ],
+  "audio_bos_token": "<|audio_bos|>",
+  "audio_eos_token": "<|audio_eos|>",
+  "audio_token": "<|AUDIO|>",
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {
+    "audio_bos_token": "<|audio_bos|>",
+    "audio_eos_token": "<|audio_eos|>",
+    "audio_token": "<|AUDIO|>",
+    "image_token": "<|IMAGE|>",
+    "video_token": "<|VIDEO|>",
+    "vision_bos_token": "<|vision_bos|>",
+    "vision_eos_token": "<|vision_eos|>"
+  },
+  "image_token": "<|IMAGE|>",
+  "model_max_length": 32768,
+  "pad_token": "<|endoftext|>",
+  "processor_class": "ColQwen2_5OmniProcessor",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null,
+  "video_token": "<|VIDEO|>",
+  "vision_bos_token": "<|vision_bos|>",
+  "vision_eos_token": "<|vision_eos|>"
+}

train_colqwenomni_model.py ADDED Viewed

	@@ -0,0 +1,103 @@

+import argparse
+import shutil
+from pathlib import Path
+import torch
+from datasets import load_dataset
+from peft import LoraConfig
+from transformers import TrainingArguments
+from colpali_engine.data.dataset import ColPaliEngineDataset
+from colpali_engine.loss.late_interaction_losses import ColbertLoss, ColbertPairwiseCELoss
+from colpali_engine.models import ColQwen2_5Omni, ColQwen2_5OmniProcessor
+from colpali_engine.trainer.colmodel_torch_training import ColModelTorchTraining
+from colpali_engine.trainer.colmodel_training import ColModelTraining, ColModelTrainingConfig
+def parse_args():
+    p = argparse.ArgumentParser()
+    p.add_argument("--output-dir", type=str, required=True, help="where to write model + script copy")
+    p.add_argument("--lr", type=float, default=1e-4, help="learning rate")
+    p.add_argument("--tau", type=float, default=0.02, help="temperature for loss function")
+    p.add_argument("--trainer", type=str, default="hf", choices=["torch", "hf"], help="trainer to use")
+    p.add_argument("--loss", type=str, default="ce", choices=["ce", "pairwise"], help="loss function to use")
+    p.add_argument("--peft", action="store_true", help="use PEFT for training")
+    return p.parse_args()
+if __name__ == "__main__":
+    args = parse_args()
+    if args.loss == "ce":
+        loss_func = ColbertLoss(
+            temperature=args.tau,
+            normalize_scores=True,
+            use_smooth_max=False,
+            pos_aware_negative_filtering=False,
+        )
+    elif args.loss == "pairwise":
+        loss_func = ColbertPairwiseCELoss(
+            normalize_scores=False,
+        )
+    else:
+        raise ValueError(f"Unknown loss function: {args.loss}")
+    config = ColModelTrainingConfig(
+        output_dir=args.output_dir,
+        processor=ColQwen2_5OmniProcessor.from_pretrained(
+            pretrained_model_name_or_path="./models/base_models/colqwen2.5omni-base",
+        ),
+        model=ColQwen2_5Omni.from_pretrained(
+            pretrained_model_name_or_path="./models/base_models/colqwen2.5omni-base",
+            torch_dtype=torch.bfloat16,
+            attn_implementation="flash_attention_2",
+        ),
+        train_dataset=ColPaliEngineDataset(
+            load_dataset("./data_dir/colpali_train_set", split="train"), pos_target_column_name="image"
+        ),
+        eval_dataset=ColPaliEngineDataset(
+            load_dataset("./data_dir/colpali_train_set", split="test"), pos_target_column_name="image"
+        ),
+        run_eval=True,
+        loss_func=loss_func,
+        tr_args=TrainingArguments(
+            output_dir=None,
+            overwrite_output_dir=True,
+            num_train_epochs=5,
+            per_device_train_batch_size=64,
+            gradient_checkpointing=True,
+            gradient_checkpointing_kwargs={"use_reentrant": False},
+            per_device_eval_batch_size=16,
+            eval_strategy="steps",
+            dataloader_num_workers=2,
+            save_steps=500,
+            logging_steps=10,
+            eval_steps=100,
+            warmup_steps=100,
+            learning_rate=args.lr,
+            save_total_limit=1,
+            dataloader_prefetch_factor=2,
+            dataloader_pin_memory=True,
+            dataloader_persistent_workers=True,
+        ),
+        peft_config=LoraConfig(
+            r=32,
+            lora_alpha=32,
+            lora_dropout=0.1,
+            init_lora_weights="gaussian",
+            bias="none",
+            task_type="FEATURE_EXTRACTION",
+            target_modules="(.*(model)(?!.*visual).*(down_proj|gate_proj|up_proj|k_proj|q_proj|v_proj|o_proj).*$|.*(custom_text_proj).*$)",
+        )
+        if args.peft
+        else None,
+    )
+    config.model.audio_tower = torch.nn.Identity()  # Disable the audio tower
+    # config.model = torch.compile(config.model, dynamic=True, fullgraph=True, mode="max-autotune")
+    # make sure output_dir exists and copy script for provenance
+    Path(config.output_dir).mkdir(parents=True, exist_ok=True)
+    shutil.copy(Path(__file__), Path(config.output_dir) / Path(__file__).name)
+    trainer = ColModelTraining(config) if args.trainer == "hf" else ColModelTorchTraining(config)
+    trainer.train()
+    trainer.save()

video_preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,56 @@

+{
+  "chunk_length": 300,
+  "crop_size": null,
+  "data_format": "channels_first",
+  "default_to_square": true,
+  "device": null,
+  "dither": 0.0,
+  "do_center_crop": null,
+  "do_convert_rgb": true,
+  "do_normalize": true,
+  "do_pad": null,
+  "do_rescale": true,
+  "do_resize": true,
+  "do_sample_frames": false,
+  "feature_extractor_type": "WhisperFeatureExtractor",
+  "feature_size": 128,
+  "fps": null,
+  "hop_length": 160,
+  "image_mean": [
+    0.48145466,
+    0.4578275,
+    0.40821073
+  ],
+  "image_processor_type": "Qwen2VLImageProcessor",
+  "image_std": [
+    0.26862954,
+    0.26130258,
+    0.27577711
+  ],
+  "input_data_format": null,
+  "max_frames": 768,
+  "max_pixels": 12845056,
+  "merge_size": 2,
+  "min_frames": 4,
+  "min_pixels": 3136,
+  "n_fft": 400,
+  "n_samples": 4800000,
+  "nb_max_frames": 30000,
+  "num_frames": null,
+  "padding_side": "right",
+  "padding_value": 0.0,
+  "patch_size": 14,
+  "processor_class": "ColQwen2_5OmniProcessor",
+  "resample": 3,
+  "rescale_factor": 0.00392156862745098,
+  "return_attention_mask": true,
+  "sampling_rate": 16000,
+  "size": {
+    "longest_edge": 12845056,
+    "shortest_edge": 3136
+  },
+  "size_divisor": null,
+  "temporal_patch_size": 2,
+  "video_metadata": null,
+  "video_processor_type": "Qwen2VLVideoProcessor"
+}

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff