Upload 15 files

Browse files

Files changed (9) hide show

README.md +175 -20
adapter_config.json +3 -3
adapter_model.safetensors +2 -2
optimizer.pt +2 -2
rng_state.pth +1 -1
scheduler.pt +1 -1
tokenizer.json +2 -2
trainer_state.json +665 -115
training_args.bin +2 -2

README.md CHANGED Viewed

@@ -1,47 +1,202 @@
 ---
-base_model: Qwen/Qwen3-8B
 library_name: peft
-license: mit
 ---
 ### Direct Use
-#Load model
-```
-from peft import PeftModel
-from transformers import AutoModelForCausalLM, AutoTokenizer
-base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B")
-model3 = PeftModel.from_pretrained(base_model, "TMAE-Triage/MedConsultLLM")
-tokenizer3 = AutoTokenizer.from_pretrained("TMAE-Triage/MedConsultLLM")
-inputs3 = tokenizer3("Complication: NKDA \n\n Patient SOB & RNA."  + " <|expand|>", return_tensors="pt")
-outputs3 = model3.generate(input_ids=inputs3.input_ids, max_new_tokens=50)
-print(tokenizer3.decode(outputs3[0], skip_special_tokens=True))
-#Result:
-Complication: NKDA
- Patient SOB & RNA. <|expand|> Complication: no known drug allergies.
- Patient shortness of breath and ribonucleic acid.
-```
 #### Metrics
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/650275ef20246c6f9f4e74b1/rVfucAsQ0JLgVOelrCaHQ.png)
-### Training Detail
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/650275ef20246c6f9f4e74b1/pmskOxhOBrutdKu_c3FZv.png)
 ### Framework versions
 - PEFT 0.15.2

 ---
+base_model: Qwen/Qwen3-32B
 library_name: peft
 ---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
 #### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
 ### Framework versions
 - PEFT 0.15.2

adapter_config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "alpha_pattern": {},
   "auto_mapping": null,
-  "base_model_name_or_path": "Qwen/Qwen3-8B",
   "bias": "none",
   "corda_config": null,
   "eva_config": null,
@@ -24,10 +24,10 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "o_proj",
     "q_proj",
     "v_proj",
-    "k_proj"
   ],
   "task_type": "CAUSAL_LM",
   "trainable_token_indices": null,

 {
   "alpha_pattern": {},
   "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen3-32B",
   "bias": "none",
   "corda_config": null,
   "eva_config": null,
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "q_proj",
+    "k_proj",
     "v_proj",
+    "o_proj"
   ],
   "task_type": "CAUSAL_LM",
   "trainable_token_indices": null,

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e27db5e2383c2771890e5073626ff646df06ef2607a6cdfda404dfbf7050abe2
-size 30709192

 version https://git-lfs.github.com/spec/v1
+oid sha256:b712f07b25f6d3059c623ccccb84b0ad93fcdf619df26bf6ee5a7b8e3c0c174d
+size 79760200

optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:388ebeaf77ae8ef3a743b4e98447f815f772a75cd5f6b8275c4d3553f4606016
-size 61583354

 version https://git-lfs.github.com/spec/v1
+oid sha256:661747ed17bd07ad8384a6616d8395331e4741d9ce3e4ae7563197f55eb6a147
+size 159814674

rng_state.pth CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1481c7f8158a49fc2211c2fa3809aad42a5de6192cc1b2af127c2864a6eda6c9
 size 14244

 version https://git-lfs.github.com/spec/v1
+oid sha256:5e3b708c3b2e76466f3ac803dc0a7f21e15943a4392d9feca4425baf62611de9
 size 14244

scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:769ae8e1907c467549704a04a542a28ae1480d439b7a7783f536463f0a5f0e3c
 size 1064

 version https://git-lfs.github.com/spec/v1
+oid sha256:19b2629bc0bd8daaa99d91d86714f96b66f12a2bb2230a803ab28a5a80c623c3
 size 1064

tokenizer.json CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1574cf58b63a2a56db9bc28f6ddcac4ece87690840939153189077692486f4ee
-size 11422920

 version https://git-lfs.github.com/spec/v1
+oid sha256:fbd5dd30a62db2f0ead71513492e40939dca4240dd5141e0a525212e2a45ff74
+size 11422923

trainer_state.json CHANGED Viewed

@@ -1,187 +1,737 @@
 {
-  "best_global_step": 720,
-  "best_metric": 0.30677664279937744,
-  "best_model_checkpoint": "./qwen_medical_reports_finetuned/checkpoint-720",
-  "epoch": 9.0,
   "eval_steps": 500,
-  "global_step": 720,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
   "log_history": [
     {
-      "epoch": 0.625,
-      "grad_norm": 0.5614568591117859,
-      "learning_rate": 9.69375e-05,
-      "loss": 4.086,
       "step": 50
     },
     {
       "epoch": 1.0,
-      "eval_loss": 0.7464421987533569,
-      "eval_runtime": 2.8902,
-      "eval_samples_per_second": 69.198,
-      "eval_steps_per_second": 6.92,
-      "step": 80
     },
     {
       "epoch": 1.25,
-      "grad_norm": 0.4924369752407074,
-      "learning_rate": 9.38125e-05,
-      "loss": 0.7541,
-      "step": 100
     },
     {
-      "epoch": 1.875,
-      "grad_norm": 0.3845207393169403,
-      "learning_rate": 9.06875e-05,
-      "loss": 0.5257,
-      "step": 150
     },
     {
       "epoch": 2.0,
-      "eval_loss": 0.5174583196640015,
-      "eval_runtime": 2.8881,
-      "eval_samples_per_second": 69.25,
-      "eval_steps_per_second": 6.925,
-      "step": 160
     },
     {
       "epoch": 2.5,
-      "grad_norm": 0.5862274169921875,
-      "learning_rate": 8.756250000000001e-05,
-      "loss": 0.441,
-      "step": 200
     },
     {
       "epoch": 3.0,
-      "eval_loss": 0.4074719548225403,
-      "eval_runtime": 2.8904,
-      "eval_samples_per_second": 69.195,
-      "eval_steps_per_second": 6.92,
-      "step": 240
     },
     {
-      "epoch": 3.125,
-      "grad_norm": 0.6350286602973938,
-      "learning_rate": 8.44375e-05,
-      "loss": 0.3827,
-      "step": 250
     },
     {
       "epoch": 3.75,
-      "grad_norm": 0.7117815017700195,
-      "learning_rate": 8.13125e-05,
-      "loss": 0.3275,
-      "step": 300
     },
     {
       "epoch": 4.0,
-      "eval_loss": 0.34577617049217224,
-      "eval_runtime": 2.8862,
-      "eval_samples_per_second": 69.295,
-      "eval_steps_per_second": 6.93,
-      "step": 320
     },
     {
-      "epoch": 4.375,
-      "grad_norm": 0.643543004989624,
-      "learning_rate": 7.81875e-05,
-      "loss": 0.2982,
-      "step": 350
     },
     {
       "epoch": 5.0,
-      "grad_norm": 0.6653191447257996,
-      "learning_rate": 7.50625e-05,
-      "loss": 0.286,
-      "step": 400
     },
     {
       "epoch": 5.0,
-      "eval_loss": 0.3198166489601135,
-      "eval_runtime": 2.889,
-      "eval_samples_per_second": 69.228,
-      "eval_steps_per_second": 6.923,
-      "step": 400
     },
     {
-      "epoch": 5.625,
-      "grad_norm": 1.0326348543167114,
-      "learning_rate": 7.193750000000001e-05,
-      "loss": 0.273,
-      "step": 450
     },
     {
       "epoch": 6.0,
-      "eval_loss": 0.3160374164581299,
-      "eval_runtime": 2.8861,
-      "eval_samples_per_second": 69.297,
-      "eval_steps_per_second": 6.93,
-      "step": 480
     },
     {
       "epoch": 6.25,
-      "grad_norm": 0.591899573802948,
-      "learning_rate": 6.88125e-05,
-      "loss": 0.2706,
-      "step": 500
     },
     {
-      "epoch": 6.875,
-      "grad_norm": 0.6523064970970154,
-      "learning_rate": 6.56875e-05,
-      "loss": 0.2667,
-      "step": 550
     },
     {
       "epoch": 7.0,
-      "eval_loss": 0.3083480894565582,
-      "eval_runtime": 2.888,
-      "eval_samples_per_second": 69.252,
-      "eval_steps_per_second": 6.925,
-      "step": 560
     },
     {
       "epoch": 7.5,
-      "grad_norm": 0.5645153522491455,
-      "learning_rate": 6.25625e-05,
-      "loss": 0.2618,
-      "step": 600
     },
     {
       "epoch": 8.0,
-      "eval_loss": 0.30845606327056885,
-      "eval_runtime": 2.8881,
-      "eval_samples_per_second": 69.249,
-      "eval_steps_per_second": 6.925,
-      "step": 640
     },
     {
-      "epoch": 8.125,
-      "grad_norm": 0.5773406624794006,
-      "learning_rate": 5.94375e-05,
-      "loss": 0.261,
-      "step": 650
     },
     {
       "epoch": 8.75,
-      "grad_norm": 0.62261563539505,
-      "learning_rate": 5.63125e-05,
-      "loss": 0.2568,
-      "step": 700
     },
     {
       "epoch": 9.0,
-      "eval_loss": 0.30677664279937744,
-      "eval_runtime": 2.8866,
-      "eval_samples_per_second": 69.286,
-      "eval_steps_per_second": 6.929,
-      "step": 720
     }
   ],
   "logging_steps": 50,
-  "max_steps": 1600,
   "num_input_tokens_seen": 0,
   "num_train_epochs": 20,
   "save_steps": 500,
@@ -192,12 +742,12 @@
         "should_evaluate": false,
         "should_log": false,
         "should_save": true,
-        "should_training_stop": false
       },
       "attributes": {}
     }
   },
-  "total_flos": 9.03310361690112e+16,
   "train_batch_size": 10,
   "trial_name": null,
   "trial_params": null

 {
+  "best_global_step": 1000,
+  "best_metric": 0.283452570438385,
+  "best_model_checkpoint": "./qwen_medical_reports_finetuned/checkpoint-1000",
+  "epoch": 20.0,
   "eval_steps": 500,
+  "global_step": 4000,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
   "log_history": [
     {
+      "epoch": 0.25,
+      "grad_norm": 0.43099120259284973,
+      "learning_rate": 9.8775e-05,
+      "loss": 3.1377,
       "step": 50
     },
+    {
+      "epoch": 0.5,
+      "grad_norm": 0.3961893618106842,
+      "learning_rate": 9.7525e-05,
+      "loss": 0.708,
+      "step": 100
+    },
+    {
+      "epoch": 0.75,
+      "grad_norm": 0.43528544902801514,
+      "learning_rate": 9.627500000000001e-05,
+      "loss": 0.5117,
+      "step": 150
+    },
     {
       "epoch": 1.0,
+      "grad_norm": 0.552211582660675,
+      "learning_rate": 9.5025e-05,
+      "loss": 0.4371,
+      "step": 200
+    },
+    {
+      "epoch": 1.0,
+      "eval_loss": 0.40502533316612244,
+      "eval_runtime": 34.9882,
+      "eval_samples_per_second": 14.291,
+      "eval_steps_per_second": 1.429,
+      "step": 200
     },
     {
       "epoch": 1.25,
+      "grad_norm": 0.6529857516288757,
+      "learning_rate": 9.3775e-05,
+      "loss": 0.3557,
+      "step": 250
     },
     {
+      "epoch": 1.5,
+      "grad_norm": 0.5617239475250244,
+      "learning_rate": 9.252500000000001e-05,
+      "loss": 0.3167,
+      "step": 300
+    },
+    {
+      "epoch": 1.75,
+      "grad_norm": 0.5521848201751709,
+      "learning_rate": 9.1275e-05,
+      "loss": 0.2987,
+      "step": 350
     },
     {
       "epoch": 2.0,
+      "grad_norm": 0.5572051405906677,
+      "learning_rate": 9.0025e-05,
+      "loss": 0.2876,
+      "step": 400
+    },
+    {
+      "epoch": 2.0,
+      "eval_loss": 0.29635289311408997,
+      "eval_runtime": 34.9663,
+      "eval_samples_per_second": 14.3,
+      "eval_steps_per_second": 1.43,
+      "step": 400
+    },
+    {
+      "epoch": 2.25,
+      "grad_norm": 0.4148677587509155,
+      "learning_rate": 8.8775e-05,
+      "loss": 0.2775,
+      "step": 450
     },
     {
       "epoch": 2.5,
+      "grad_norm": 0.5877013206481934,
+      "learning_rate": 8.7525e-05,
+      "loss": 0.2775,
+      "step": 500
+    },
+    {
+      "epoch": 2.75,
+      "grad_norm": 0.46404188871383667,
+      "learning_rate": 8.627500000000001e-05,
+      "loss": 0.2757,
+      "step": 550
     },
     {
       "epoch": 3.0,
+      "grad_norm": 0.45785900950431824,
+      "learning_rate": 8.502499999999999e-05,
+      "loss": 0.2755,
+      "step": 600
     },
     {
+      "epoch": 3.0,
+      "eval_loss": 0.28746917843818665,
+      "eval_runtime": 34.9592,
+      "eval_samples_per_second": 14.302,
+      "eval_steps_per_second": 1.43,
+      "step": 600
+    },
+    {
+      "epoch": 3.25,
+      "grad_norm": 0.4644843637943268,
+      "learning_rate": 8.3775e-05,
+      "loss": 0.2711,
+      "step": 650
+    },
+    {
+      "epoch": 3.5,
+      "grad_norm": 0.43433475494384766,
+      "learning_rate": 8.252500000000001e-05,
+      "loss": 0.2712,
+      "step": 700
     },
     {
       "epoch": 3.75,
+      "grad_norm": 0.38372883200645447,
+      "learning_rate": 8.1275e-05,
+      "loss": 0.2713,
+      "step": 750
     },
     {
       "epoch": 4.0,
+      "grad_norm": 0.39236927032470703,
+      "learning_rate": 8.002500000000001e-05,
+      "loss": 0.2709,
+      "step": 800
     },
     {
+      "epoch": 4.0,
+      "eval_loss": 0.2852801978588104,
+      "eval_runtime": 34.9716,
+      "eval_samples_per_second": 14.297,
+      "eval_steps_per_second": 1.43,
+      "step": 800
+    },
+    {
+      "epoch": 4.25,
+      "grad_norm": 0.422234445810318,
+      "learning_rate": 7.8775e-05,
+      "loss": 0.2679,
+      "step": 850
+    },
+    {
+      "epoch": 4.5,
+      "grad_norm": 0.4149724543094635,
+      "learning_rate": 7.7525e-05,
+      "loss": 0.2682,
+      "step": 900
+    },
+    {
+      "epoch": 4.75,
+      "grad_norm": 0.37782400846481323,
+      "learning_rate": 7.627500000000001e-05,
+      "loss": 0.2683,
+      "step": 950
     },
     {
       "epoch": 5.0,
+      "grad_norm": 0.4044494926929474,
+      "learning_rate": 7.502500000000001e-05,
+      "loss": 0.2685,
+      "step": 1000
     },
     {
       "epoch": 5.0,
+      "eval_loss": 0.283452570438385,
+      "eval_runtime": 34.9579,
+      "eval_samples_per_second": 14.303,
+      "eval_steps_per_second": 1.43,
+      "step": 1000
     },
     {
+      "epoch": 5.25,
+      "grad_norm": 0.41983741521835327,
+      "learning_rate": 7.3775e-05,
+      "loss": 0.2649,
+      "step": 1050
+    },
+    {
+      "epoch": 5.5,
+      "grad_norm": 0.3734280467033386,
+      "learning_rate": 7.2525e-05,
+      "loss": 0.2661,
+      "step": 1100
+    },
+    {
+      "epoch": 5.75,
+      "grad_norm": 0.40924379229545593,
+      "learning_rate": 7.1275e-05,
+      "loss": 0.2659,
+      "step": 1150
+    },
+    {
+      "epoch": 6.0,
+      "grad_norm": 0.3856299817562103,
+      "learning_rate": 7.002500000000001e-05,
+      "loss": 0.2656,
+      "step": 1200
     },
     {
       "epoch": 6.0,
+      "eval_loss": 0.28378918766975403,
+      "eval_runtime": 34.9635,
+      "eval_samples_per_second": 14.301,
+      "eval_steps_per_second": 1.43,
+      "step": 1200
     },
     {
       "epoch": 6.25,
+      "grad_norm": 0.5564974546432495,
+      "learning_rate": 6.8775e-05,
+      "loss": 0.2628,
+      "step": 1250
     },
     {
+      "epoch": 6.5,
+      "grad_norm": 0.47460582852363586,
+      "learning_rate": 6.7525e-05,
+      "loss": 0.2632,
+      "step": 1300
+    },
+    {
+      "epoch": 6.75,
+      "grad_norm": 0.39553120732307434,
+      "learning_rate": 6.6275e-05,
+      "loss": 0.2637,
+      "step": 1350
+    },
+    {
+      "epoch": 7.0,
+      "grad_norm": 0.3899823725223541,
+      "learning_rate": 6.502500000000001e-05,
+      "loss": 0.2643,
+      "step": 1400
     },
     {
       "epoch": 7.0,
+      "eval_loss": 0.28401800990104675,
+      "eval_runtime": 34.97,
+      "eval_samples_per_second": 14.298,
+      "eval_steps_per_second": 1.43,
+      "step": 1400
+    },
+    {
+      "epoch": 7.25,
+      "grad_norm": 0.4034035801887512,
+      "learning_rate": 6.3775e-05,
+      "loss": 0.2609,
+      "step": 1450
     },
     {
       "epoch": 7.5,
+      "grad_norm": 0.3818899691104889,
+      "learning_rate": 6.2525e-05,
+      "loss": 0.2622,
+      "step": 1500
+    },
+    {
+      "epoch": 7.75,
+      "grad_norm": 0.5591850876808167,
+      "learning_rate": 6.1275e-05,
+      "loss": 0.2614,
+      "step": 1550
     },
     {
       "epoch": 8.0,
+      "grad_norm": 0.38951823115348816,
+      "learning_rate": 6.0024999999999995e-05,
+      "loss": 0.2628,
+      "step": 1600
     },
     {
+      "epoch": 8.0,
+      "eval_loss": 0.2841391861438751,
+      "eval_runtime": 34.9774,
+      "eval_samples_per_second": 14.295,
+      "eval_steps_per_second": 1.429,
+      "step": 1600
+    },
+    {
+      "epoch": 8.25,
+      "grad_norm": 0.4582635164260864,
+      "learning_rate": 5.8775000000000006e-05,
+      "loss": 0.258,
+      "step": 1650
+    },
+    {
+      "epoch": 8.5,
+      "grad_norm": 0.44102761149406433,
+      "learning_rate": 5.752500000000001e-05,
+      "loss": 0.2584,
+      "step": 1700
     },
     {
       "epoch": 8.75,
+      "grad_norm": 0.44510433077812195,
+      "learning_rate": 5.6275e-05,
+      "loss": 0.2589,
+      "step": 1750
+    },
+    {
+      "epoch": 9.0,
+      "grad_norm": 0.41570335626602173,
+      "learning_rate": 5.5025e-05,
+      "loss": 0.2603,
+      "step": 1800
     },
     {
       "epoch": 9.0,
+      "eval_loss": 0.2841310203075409,
+      "eval_runtime": 34.9715,
+      "eval_samples_per_second": 14.297,
+      "eval_steps_per_second": 1.43,
+      "step": 1800
+    },
+    {
+      "epoch": 9.25,
+      "grad_norm": 0.45228293538093567,
+      "learning_rate": 5.3775e-05,
+      "loss": 0.2542,
+      "step": 1850
+    },
+    {
+      "epoch": 9.5,
+      "grad_norm": 0.45636293292045593,
+      "learning_rate": 5.2525e-05,
+      "loss": 0.256,
+      "step": 1900
+    },
+    {
+      "epoch": 9.75,
+      "grad_norm": 0.4527854919433594,
+      "learning_rate": 5.1275000000000006e-05,
+      "loss": 0.2575,
+      "step": 1950
+    },
+    {
+      "epoch": 10.0,
+      "grad_norm": 0.4357495605945587,
+      "learning_rate": 5.0025e-05,
+      "loss": 0.2567,
+      "step": 2000
+    },
+    {
+      "epoch": 10.0,
+      "eval_loss": 0.28737518191337585,
+      "eval_runtime": 34.9799,
+      "eval_samples_per_second": 14.294,
+      "eval_steps_per_second": 1.429,
+      "step": 2000
+    },
+    {
+      "epoch": 10.25,
+      "grad_norm": 0.481499046087265,
+      "learning_rate": 4.8775000000000007e-05,
+      "loss": 0.2509,
+      "step": 2050
+    },
+    {
+      "epoch": 10.5,
+      "grad_norm": 0.474854439496994,
+      "learning_rate": 4.7525e-05,
+      "loss": 0.2522,
+      "step": 2100
+    },
+    {
+      "epoch": 10.75,
+      "grad_norm": 0.4796244204044342,
+      "learning_rate": 4.6275e-05,
+      "loss": 0.2538,
+      "step": 2150
+    },
+    {
+      "epoch": 11.0,
+      "grad_norm": 0.4484533667564392,
+      "learning_rate": 4.5025000000000003e-05,
+      "loss": 0.2542,
+      "step": 2200
+    },
+    {
+      "epoch": 11.0,
+      "eval_loss": 0.2882769703865051,
+      "eval_runtime": 34.9674,
+      "eval_samples_per_second": 14.299,
+      "eval_steps_per_second": 1.43,
+      "step": 2200
+    },
+    {
+      "epoch": 11.25,
+      "grad_norm": 0.601557195186615,
+      "learning_rate": 4.3775e-05,
+      "loss": 0.2463,
+      "step": 2250
+    },
+    {
+      "epoch": 11.5,
+      "grad_norm": 0.5119094848632812,
+      "learning_rate": 4.2525000000000004e-05,
+      "loss": 0.2485,
+      "step": 2300
+    },
+    {
+      "epoch": 11.75,
+      "grad_norm": 0.5228874683380127,
+      "learning_rate": 4.1275e-05,
+      "loss": 0.2493,
+      "step": 2350
+    },
+    {
+      "epoch": 12.0,
+      "grad_norm": 0.4916130304336548,
+      "learning_rate": 4.0025000000000004e-05,
+      "loss": 0.2498,
+      "step": 2400
+    },
+    {
+      "epoch": 12.0,
+      "eval_loss": 0.29168301820755005,
+      "eval_runtime": 34.9775,
+      "eval_samples_per_second": 14.295,
+      "eval_steps_per_second": 1.429,
+      "step": 2400
+    },
+    {
+      "epoch": 12.25,
+      "grad_norm": 0.5594790577888489,
+      "learning_rate": 3.8775e-05,
+      "loss": 0.2402,
+      "step": 2450
+    },
+    {
+      "epoch": 12.5,
+      "grad_norm": 0.5535025000572205,
+      "learning_rate": 3.7525e-05,
+      "loss": 0.2432,
+      "step": 2500
+    },
+    {
+      "epoch": 12.75,
+      "grad_norm": 0.6284328699111938,
+      "learning_rate": 3.6275e-05,
+      "loss": 0.245,
+      "step": 2550
+    },
+    {
+      "epoch": 13.0,
+      "grad_norm": 0.541262149810791,
+      "learning_rate": 3.5025000000000004e-05,
+      "loss": 0.2454,
+      "step": 2600
+    },
+    {
+      "epoch": 13.0,
+      "eval_loss": 0.29390111565589905,
+      "eval_runtime": 34.9798,
+      "eval_samples_per_second": 14.294,
+      "eval_steps_per_second": 1.429,
+      "step": 2600
+    },
+    {
+      "epoch": 13.25,
+      "grad_norm": 0.6019225716590881,
+      "learning_rate": 3.3775e-05,
+      "loss": 0.2341,
+      "step": 2650
+    },
+    {
+      "epoch": 13.5,
+      "grad_norm": 0.5700891017913818,
+      "learning_rate": 3.2525e-05,
+      "loss": 0.2374,
+      "step": 2700
+    },
+    {
+      "epoch": 13.75,
+      "grad_norm": 0.6702916622161865,
+      "learning_rate": 3.1275e-05,
+      "loss": 0.2378,
+      "step": 2750
+    },
+    {
+      "epoch": 14.0,
+      "grad_norm": 0.6081238985061646,
+      "learning_rate": 3.0025000000000005e-05,
+      "loss": 0.2379,
+      "step": 2800
+    },
+    {
+      "epoch": 14.0,
+      "eval_loss": 0.29895922541618347,
+      "eval_runtime": 34.9629,
+      "eval_samples_per_second": 14.301,
+      "eval_steps_per_second": 1.43,
+      "step": 2800
+    },
+    {
+      "epoch": 14.25,
+      "grad_norm": 0.9264949560165405,
+      "learning_rate": 2.8775e-05,
+      "loss": 0.2259,
+      "step": 2850
+    },
+    {
+      "epoch": 14.5,
+      "grad_norm": 0.6585692763328552,
+      "learning_rate": 2.7525e-05,
+      "loss": 0.2282,
+      "step": 2900
+    },
+    {
+      "epoch": 14.75,
+      "grad_norm": 0.698367714881897,
+      "learning_rate": 2.6275e-05,
+      "loss": 0.2303,
+      "step": 2950
+    },
+    {
+      "epoch": 15.0,
+      "grad_norm": 0.6891688108444214,
+      "learning_rate": 2.5025e-05,
+      "loss": 0.2298,
+      "step": 3000
+    },
+    {
+      "epoch": 15.0,
+      "eval_loss": 0.30702412128448486,
+      "eval_runtime": 34.9664,
+      "eval_samples_per_second": 14.299,
+      "eval_steps_per_second": 1.43,
+      "step": 3000
+    },
+    {
+      "epoch": 15.25,
+      "grad_norm": 0.7823687195777893,
+      "learning_rate": 2.3775e-05,
+      "loss": 0.2167,
+      "step": 3050
+    },
+    {
+      "epoch": 15.5,
+      "grad_norm": 0.7410340905189514,
+      "learning_rate": 2.2525000000000002e-05,
+      "loss": 0.2188,
+      "step": 3100
+    },
+    {
+      "epoch": 15.75,
+      "grad_norm": 0.815812349319458,
+      "learning_rate": 2.1275000000000002e-05,
+      "loss": 0.2204,
+      "step": 3150
+    },
+    {
+      "epoch": 16.0,
+      "grad_norm": 1.0224624872207642,
+      "learning_rate": 2.0025000000000002e-05,
+      "loss": 0.2215,
+      "step": 3200
+    },
+    {
+      "epoch": 16.0,
+      "eval_loss": 0.31381356716156006,
+      "eval_runtime": 34.9748,
+      "eval_samples_per_second": 14.296,
+      "eval_steps_per_second": 1.43,
+      "step": 3200
+    },
+    {
+      "epoch": 16.25,
+      "grad_norm": 0.8359068632125854,
+      "learning_rate": 1.8775000000000002e-05,
+      "loss": 0.2066,
+      "step": 3250
+    },
+    {
+      "epoch": 16.5,
+      "grad_norm": 0.8485832214355469,
+      "learning_rate": 1.7525e-05,
+      "loss": 0.2086,
+      "step": 3300
+    },
+    {
+      "epoch": 16.75,
+      "grad_norm": 0.9015569686889648,
+      "learning_rate": 1.6275000000000003e-05,
+      "loss": 0.2098,
+      "step": 3350
+    },
+    {
+      "epoch": 17.0,
+      "grad_norm": 0.85235595703125,
+      "learning_rate": 1.5025000000000001e-05,
+      "loss": 0.2101,
+      "step": 3400
+    },
+    {
+      "epoch": 17.0,
+      "eval_loss": 0.321372389793396,
+      "eval_runtime": 34.9534,
+      "eval_samples_per_second": 14.305,
+      "eval_steps_per_second": 1.43,
+      "step": 3400
+    },
+    {
+      "epoch": 17.25,
+      "grad_norm": 0.9530749917030334,
+      "learning_rate": 1.3775000000000001e-05,
+      "loss": 0.1951,
+      "step": 3450
+    },
+    {
+      "epoch": 17.5,
+      "grad_norm": 0.9698864817619324,
+      "learning_rate": 1.2525000000000001e-05,
+      "loss": 0.1972,
+      "step": 3500
+    },
+    {
+      "epoch": 17.75,
+      "grad_norm": 0.9734402894973755,
+      "learning_rate": 1.1275000000000001e-05,
+      "loss": 0.1982,
+      "step": 3550
+    },
+    {
+      "epoch": 18.0,
+      "grad_norm": 0.9597378373146057,
+      "learning_rate": 1.0025000000000001e-05,
+      "loss": 0.2002,
+      "step": 3600
+    },
+    {
+      "epoch": 18.0,
+      "eval_loss": 0.3291071653366089,
+      "eval_runtime": 34.9444,
+      "eval_samples_per_second": 14.308,
+      "eval_steps_per_second": 1.431,
+      "step": 3600
+    },
+    {
+      "epoch": 18.25,
+      "grad_norm": 0.9978246688842773,
+      "learning_rate": 8.775e-06,
+      "loss": 0.1851,
+      "step": 3650
+    },
+    {
+      "epoch": 18.5,
+      "grad_norm": 1.0513160228729248,
+      "learning_rate": 7.525e-06,
+      "loss": 0.1874,
+      "step": 3700
+    },
+    {
+      "epoch": 18.75,
+      "grad_norm": 1.0297590494155884,
+      "learning_rate": 6.275e-06,
+      "loss": 0.1876,
+      "step": 3750
+    },
+    {
+      "epoch": 19.0,
+      "grad_norm": 1.0286418199539185,
+      "learning_rate": 5.025e-06,
+      "loss": 0.1883,
+      "step": 3800
+    },
+    {
+      "epoch": 19.0,
+      "eval_loss": 0.3394256830215454,
+      "eval_runtime": 34.9405,
+      "eval_samples_per_second": 14.31,
+      "eval_steps_per_second": 1.431,
+      "step": 3800
+    },
+    {
+      "epoch": 19.25,
+      "grad_norm": 1.0999506711959839,
+      "learning_rate": 3.775e-06,
+      "loss": 0.1794,
+      "step": 3850
+    },
+    {
+      "epoch": 19.5,
+      "grad_norm": 1.1211074590682983,
+      "learning_rate": 2.5250000000000004e-06,
+      "loss": 0.1784,
+      "step": 3900
+    },
+    {
+      "epoch": 19.75,
+      "grad_norm": 1.0398590564727783,
+      "learning_rate": 1.275e-06,
+      "loss": 0.1781,
+      "step": 3950
+    },
+    {
+      "epoch": 20.0,
+      "grad_norm": 1.1269780397415161,
+      "learning_rate": 2.5000000000000002e-08,
+      "loss": 0.1784,
+      "step": 4000
+    },
+    {
+      "epoch": 20.0,
+      "eval_loss": 0.34511706233024597,
+      "eval_runtime": 34.9533,
+      "eval_samples_per_second": 14.305,
+      "eval_steps_per_second": 1.43,
+      "step": 4000
     }
   ],
   "logging_steps": 50,
+  "max_steps": 4000,
   "num_input_tokens_seen": 0,
   "num_train_epochs": 20,
   "save_steps": 500,
         "should_evaluate": false,
         "should_log": false,
         "should_save": true,
+        "should_training_stop": true
       },
       "attributes": {}
     }
   },
+  "total_flos": 2.073044089536123e+18,
   "train_batch_size": 10,
   "trial_name": null,
   "trial_params": null

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1ccfc48d5bf5d433a7901c2e32bcd41977a609c84a21fd4563bfa19f725b9990
-size 5240

 version https://git-lfs.github.com/spec/v1
+oid sha256:f0c477509df8159054dc811265d4f27c236cef7ab115b1572faaba91e539c09f
+size 5304