Ben88 commited on
Commit
b6b96a4
·
verified ·
1 Parent(s): 922e2c9

Upload 15 files

Browse files
README.md CHANGED
@@ -1,47 +1,202 @@
1
  ---
2
- base_model: Qwen/Qwen3-8B
3
  library_name: peft
4
- license: mit
5
  ---
6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
  ### Direct Use
9
 
10
- #Load model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
- ```
13
- from peft import PeftModel
14
- from transformers import AutoModelForCausalLM, AutoTokenizer
15
 
16
- base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B")
17
- model3 = PeftModel.from_pretrained(base_model, "TMAE-Triage/MedConsultLLM")
18
- tokenizer3 = AutoTokenizer.from_pretrained("TMAE-Triage/MedConsultLLM")
19
 
20
- inputs3 = tokenizer3("Complication: NKDA \n\n Patient SOB & RNA." + " <|expand|>", return_tensors="pt")
21
- outputs3 = model3.generate(input_ids=inputs3.input_ids, max_new_tokens=50)
22
- print(tokenizer3.decode(outputs3[0], skip_special_tokens=True))
23
 
24
- #Result:
25
 
26
- Complication: NKDA
27
 
28
- Patient SOB & RNA. <|expand|> Complication: no known drug allergies.
29
 
30
- Patient shortness of breath and ribonucleic acid.
31
- ```
32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
  #### Metrics
35
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/650275ef20246c6f9f4e74b1/rVfucAsQ0JLgVOelrCaHQ.png)
38
 
 
39
 
40
- ### Training Detail
41
 
42
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/650275ef20246c6f9f4e74b1/pmskOxhOBrutdKu_c3FZv.png)
43
 
 
44
 
 
45
  ### Framework versions
46
 
47
  - PEFT 0.15.2
 
1
  ---
2
+ base_model: Qwen/Qwen3-32B
3
  library_name: peft
 
4
  ---
5
 
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
 
40
  ### Direct Use
41
 
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
 
 
 
 
92
 
93
+ #### Training Hyperparameters
 
 
94
 
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 
 
96
 
97
+ #### Speeds, Sizes, Times [optional]
98
 
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
 
101
+ [More Information Needed]
102
 
103
+ ## Evaluation
 
104
 
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
 
121
  #### Metrics
122
 
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
 
189
+ ## More Information [optional]
190
 
191
+ [More Information Needed]
192
 
193
+ ## Model Card Authors [optional]
194
 
195
+ [More Information Needed]
196
 
197
+ ## Model Card Contact
198
 
199
+ [More Information Needed]
200
  ### Framework versions
201
 
202
  - PEFT 0.15.2
adapter_config.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "alpha_pattern": {},
3
  "auto_mapping": null,
4
- "base_model_name_or_path": "Qwen/Qwen3-8B",
5
  "bias": "none",
6
  "corda_config": null,
7
  "eva_config": null,
@@ -24,10 +24,10 @@
24
  "rank_pattern": {},
25
  "revision": null,
26
  "target_modules": [
27
- "o_proj",
28
  "q_proj",
 
29
  "v_proj",
30
- "k_proj"
31
  ],
32
  "task_type": "CAUSAL_LM",
33
  "trainable_token_indices": null,
 
1
  {
2
  "alpha_pattern": {},
3
  "auto_mapping": null,
4
+ "base_model_name_or_path": "Qwen/Qwen3-32B",
5
  "bias": "none",
6
  "corda_config": null,
7
  "eva_config": null,
 
24
  "rank_pattern": {},
25
  "revision": null,
26
  "target_modules": [
 
27
  "q_proj",
28
+ "k_proj",
29
  "v_proj",
30
+ "o_proj"
31
  ],
32
  "task_type": "CAUSAL_LM",
33
  "trainable_token_indices": null,
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e27db5e2383c2771890e5073626ff646df06ef2607a6cdfda404dfbf7050abe2
3
- size 30709192
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b712f07b25f6d3059c623ccccb84b0ad93fcdf619df26bf6ee5a7b8e3c0c174d
3
+ size 79760200
optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:388ebeaf77ae8ef3a743b4e98447f815f772a75cd5f6b8275c4d3553f4606016
3
- size 61583354
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:661747ed17bd07ad8384a6616d8395331e4741d9ce3e4ae7563197f55eb6a147
3
+ size 159814674
rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1481c7f8158a49fc2211c2fa3809aad42a5de6192cc1b2af127c2864a6eda6c9
3
  size 14244
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5e3b708c3b2e76466f3ac803dc0a7f21e15943a4392d9feca4425baf62611de9
3
  size 14244
scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:769ae8e1907c467549704a04a542a28ae1480d439b7a7783f536463f0a5f0e3c
3
  size 1064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:19b2629bc0bd8daaa99d91d86714f96b66f12a2bb2230a803ab28a5a80c623c3
3
  size 1064
tokenizer.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1574cf58b63a2a56db9bc28f6ddcac4ece87690840939153189077692486f4ee
3
- size 11422920
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fbd5dd30a62db2f0ead71513492e40939dca4240dd5141e0a525212e2a45ff74
3
+ size 11422923
trainer_state.json CHANGED
@@ -1,187 +1,737 @@
1
  {
2
- "best_global_step": 720,
3
- "best_metric": 0.30677664279937744,
4
- "best_model_checkpoint": "./qwen_medical_reports_finetuned/checkpoint-720",
5
- "epoch": 9.0,
6
  "eval_steps": 500,
7
- "global_step": 720,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
11
  "log_history": [
12
  {
13
- "epoch": 0.625,
14
- "grad_norm": 0.5614568591117859,
15
- "learning_rate": 9.69375e-05,
16
- "loss": 4.086,
17
  "step": 50
18
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  {
20
  "epoch": 1.0,
21
- "eval_loss": 0.7464421987533569,
22
- "eval_runtime": 2.8902,
23
- "eval_samples_per_second": 69.198,
24
- "eval_steps_per_second": 6.92,
25
- "step": 80
 
 
 
 
 
 
 
26
  },
27
  {
28
  "epoch": 1.25,
29
- "grad_norm": 0.4924369752407074,
30
- "learning_rate": 9.38125e-05,
31
- "loss": 0.7541,
32
- "step": 100
33
  },
34
  {
35
- "epoch": 1.875,
36
- "grad_norm": 0.3845207393169403,
37
- "learning_rate": 9.06875e-05,
38
- "loss": 0.5257,
39
- "step": 150
 
 
 
 
 
 
 
40
  },
41
  {
42
  "epoch": 2.0,
43
- "eval_loss": 0.5174583196640015,
44
- "eval_runtime": 2.8881,
45
- "eval_samples_per_second": 69.25,
46
- "eval_steps_per_second": 6.925,
47
- "step": 160
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
  },
49
  {
50
  "epoch": 2.5,
51
- "grad_norm": 0.5862274169921875,
52
- "learning_rate": 8.756250000000001e-05,
53
- "loss": 0.441,
54
- "step": 200
 
 
 
 
 
 
 
55
  },
56
  {
57
  "epoch": 3.0,
58
- "eval_loss": 0.4074719548225403,
59
- "eval_runtime": 2.8904,
60
- "eval_samples_per_second": 69.195,
61
- "eval_steps_per_second": 6.92,
62
- "step": 240
63
  },
64
  {
65
- "epoch": 3.125,
66
- "grad_norm": 0.6350286602973938,
67
- "learning_rate": 8.44375e-05,
68
- "loss": 0.3827,
69
- "step": 250
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
  },
71
  {
72
  "epoch": 3.75,
73
- "grad_norm": 0.7117815017700195,
74
- "learning_rate": 8.13125e-05,
75
- "loss": 0.3275,
76
- "step": 300
77
  },
78
  {
79
  "epoch": 4.0,
80
- "eval_loss": 0.34577617049217224,
81
- "eval_runtime": 2.8862,
82
- "eval_samples_per_second": 69.295,
83
- "eval_steps_per_second": 6.93,
84
- "step": 320
85
  },
86
  {
87
- "epoch": 4.375,
88
- "grad_norm": 0.643543004989624,
89
- "learning_rate": 7.81875e-05,
90
- "loss": 0.2982,
91
- "step": 350
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92
  },
93
  {
94
  "epoch": 5.0,
95
- "grad_norm": 0.6653191447257996,
96
- "learning_rate": 7.50625e-05,
97
- "loss": 0.286,
98
- "step": 400
99
  },
100
  {
101
  "epoch": 5.0,
102
- "eval_loss": 0.3198166489601135,
103
- "eval_runtime": 2.889,
104
- "eval_samples_per_second": 69.228,
105
- "eval_steps_per_second": 6.923,
106
- "step": 400
107
  },
108
  {
109
- "epoch": 5.625,
110
- "grad_norm": 1.0326348543167114,
111
- "learning_rate": 7.193750000000001e-05,
112
- "loss": 0.273,
113
- "step": 450
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
114
  },
115
  {
116
  "epoch": 6.0,
117
- "eval_loss": 0.3160374164581299,
118
- "eval_runtime": 2.8861,
119
- "eval_samples_per_second": 69.297,
120
- "eval_steps_per_second": 6.93,
121
- "step": 480
122
  },
123
  {
124
  "epoch": 6.25,
125
- "grad_norm": 0.591899573802948,
126
- "learning_rate": 6.88125e-05,
127
- "loss": 0.2706,
128
- "step": 500
129
  },
130
  {
131
- "epoch": 6.875,
132
- "grad_norm": 0.6523064970970154,
133
- "learning_rate": 6.56875e-05,
134
- "loss": 0.2667,
135
- "step": 550
 
 
 
 
 
 
 
 
 
 
 
 
 
 
136
  },
137
  {
138
  "epoch": 7.0,
139
- "eval_loss": 0.3083480894565582,
140
- "eval_runtime": 2.888,
141
- "eval_samples_per_second": 69.252,
142
- "eval_steps_per_second": 6.925,
143
- "step": 560
 
 
 
 
 
 
 
144
  },
145
  {
146
  "epoch": 7.5,
147
- "grad_norm": 0.5645153522491455,
148
- "learning_rate": 6.25625e-05,
149
- "loss": 0.2618,
150
- "step": 600
 
 
 
 
 
 
 
151
  },
152
  {
153
  "epoch": 8.0,
154
- "eval_loss": 0.30845606327056885,
155
- "eval_runtime": 2.8881,
156
- "eval_samples_per_second": 69.249,
157
- "eval_steps_per_second": 6.925,
158
- "step": 640
159
  },
160
  {
161
- "epoch": 8.125,
162
- "grad_norm": 0.5773406624794006,
163
- "learning_rate": 5.94375e-05,
164
- "loss": 0.261,
165
- "step": 650
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
166
  },
167
  {
168
  "epoch": 8.75,
169
- "grad_norm": 0.62261563539505,
170
- "learning_rate": 5.63125e-05,
171
- "loss": 0.2568,
172
- "step": 700
 
 
 
 
 
 
 
173
  },
174
  {
175
  "epoch": 9.0,
176
- "eval_loss": 0.30677664279937744,
177
- "eval_runtime": 2.8866,
178
- "eval_samples_per_second": 69.286,
179
- "eval_steps_per_second": 6.929,
180
- "step": 720
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
181
  }
182
  ],
183
  "logging_steps": 50,
184
- "max_steps": 1600,
185
  "num_input_tokens_seen": 0,
186
  "num_train_epochs": 20,
187
  "save_steps": 500,
@@ -192,12 +742,12 @@
192
  "should_evaluate": false,
193
  "should_log": false,
194
  "should_save": true,
195
- "should_training_stop": false
196
  },
197
  "attributes": {}
198
  }
199
  },
200
- "total_flos": 9.03310361690112e+16,
201
  "train_batch_size": 10,
202
  "trial_name": null,
203
  "trial_params": null
 
1
  {
2
+ "best_global_step": 1000,
3
+ "best_metric": 0.283452570438385,
4
+ "best_model_checkpoint": "./qwen_medical_reports_finetuned/checkpoint-1000",
5
+ "epoch": 20.0,
6
  "eval_steps": 500,
7
+ "global_step": 4000,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
11
  "log_history": [
12
  {
13
+ "epoch": 0.25,
14
+ "grad_norm": 0.43099120259284973,
15
+ "learning_rate": 9.8775e-05,
16
+ "loss": 3.1377,
17
  "step": 50
18
  },
19
+ {
20
+ "epoch": 0.5,
21
+ "grad_norm": 0.3961893618106842,
22
+ "learning_rate": 9.7525e-05,
23
+ "loss": 0.708,
24
+ "step": 100
25
+ },
26
+ {
27
+ "epoch": 0.75,
28
+ "grad_norm": 0.43528544902801514,
29
+ "learning_rate": 9.627500000000001e-05,
30
+ "loss": 0.5117,
31
+ "step": 150
32
+ },
33
  {
34
  "epoch": 1.0,
35
+ "grad_norm": 0.552211582660675,
36
+ "learning_rate": 9.5025e-05,
37
+ "loss": 0.4371,
38
+ "step": 200
39
+ },
40
+ {
41
+ "epoch": 1.0,
42
+ "eval_loss": 0.40502533316612244,
43
+ "eval_runtime": 34.9882,
44
+ "eval_samples_per_second": 14.291,
45
+ "eval_steps_per_second": 1.429,
46
+ "step": 200
47
  },
48
  {
49
  "epoch": 1.25,
50
+ "grad_norm": 0.6529857516288757,
51
+ "learning_rate": 9.3775e-05,
52
+ "loss": 0.3557,
53
+ "step": 250
54
  },
55
  {
56
+ "epoch": 1.5,
57
+ "grad_norm": 0.5617239475250244,
58
+ "learning_rate": 9.252500000000001e-05,
59
+ "loss": 0.3167,
60
+ "step": 300
61
+ },
62
+ {
63
+ "epoch": 1.75,
64
+ "grad_norm": 0.5521848201751709,
65
+ "learning_rate": 9.1275e-05,
66
+ "loss": 0.2987,
67
+ "step": 350
68
  },
69
  {
70
  "epoch": 2.0,
71
+ "grad_norm": 0.5572051405906677,
72
+ "learning_rate": 9.0025e-05,
73
+ "loss": 0.2876,
74
+ "step": 400
75
+ },
76
+ {
77
+ "epoch": 2.0,
78
+ "eval_loss": 0.29635289311408997,
79
+ "eval_runtime": 34.9663,
80
+ "eval_samples_per_second": 14.3,
81
+ "eval_steps_per_second": 1.43,
82
+ "step": 400
83
+ },
84
+ {
85
+ "epoch": 2.25,
86
+ "grad_norm": 0.4148677587509155,
87
+ "learning_rate": 8.8775e-05,
88
+ "loss": 0.2775,
89
+ "step": 450
90
  },
91
  {
92
  "epoch": 2.5,
93
+ "grad_norm": 0.5877013206481934,
94
+ "learning_rate": 8.7525e-05,
95
+ "loss": 0.2775,
96
+ "step": 500
97
+ },
98
+ {
99
+ "epoch": 2.75,
100
+ "grad_norm": 0.46404188871383667,
101
+ "learning_rate": 8.627500000000001e-05,
102
+ "loss": 0.2757,
103
+ "step": 550
104
  },
105
  {
106
  "epoch": 3.0,
107
+ "grad_norm": 0.45785900950431824,
108
+ "learning_rate": 8.502499999999999e-05,
109
+ "loss": 0.2755,
110
+ "step": 600
 
111
  },
112
  {
113
+ "epoch": 3.0,
114
+ "eval_loss": 0.28746917843818665,
115
+ "eval_runtime": 34.9592,
116
+ "eval_samples_per_second": 14.302,
117
+ "eval_steps_per_second": 1.43,
118
+ "step": 600
119
+ },
120
+ {
121
+ "epoch": 3.25,
122
+ "grad_norm": 0.4644843637943268,
123
+ "learning_rate": 8.3775e-05,
124
+ "loss": 0.2711,
125
+ "step": 650
126
+ },
127
+ {
128
+ "epoch": 3.5,
129
+ "grad_norm": 0.43433475494384766,
130
+ "learning_rate": 8.252500000000001e-05,
131
+ "loss": 0.2712,
132
+ "step": 700
133
  },
134
  {
135
  "epoch": 3.75,
136
+ "grad_norm": 0.38372883200645447,
137
+ "learning_rate": 8.1275e-05,
138
+ "loss": 0.2713,
139
+ "step": 750
140
  },
141
  {
142
  "epoch": 4.0,
143
+ "grad_norm": 0.39236927032470703,
144
+ "learning_rate": 8.002500000000001e-05,
145
+ "loss": 0.2709,
146
+ "step": 800
 
147
  },
148
  {
149
+ "epoch": 4.0,
150
+ "eval_loss": 0.2852801978588104,
151
+ "eval_runtime": 34.9716,
152
+ "eval_samples_per_second": 14.297,
153
+ "eval_steps_per_second": 1.43,
154
+ "step": 800
155
+ },
156
+ {
157
+ "epoch": 4.25,
158
+ "grad_norm": 0.422234445810318,
159
+ "learning_rate": 7.8775e-05,
160
+ "loss": 0.2679,
161
+ "step": 850
162
+ },
163
+ {
164
+ "epoch": 4.5,
165
+ "grad_norm": 0.4149724543094635,
166
+ "learning_rate": 7.7525e-05,
167
+ "loss": 0.2682,
168
+ "step": 900
169
+ },
170
+ {
171
+ "epoch": 4.75,
172
+ "grad_norm": 0.37782400846481323,
173
+ "learning_rate": 7.627500000000001e-05,
174
+ "loss": 0.2683,
175
+ "step": 950
176
  },
177
  {
178
  "epoch": 5.0,
179
+ "grad_norm": 0.4044494926929474,
180
+ "learning_rate": 7.502500000000001e-05,
181
+ "loss": 0.2685,
182
+ "step": 1000
183
  },
184
  {
185
  "epoch": 5.0,
186
+ "eval_loss": 0.283452570438385,
187
+ "eval_runtime": 34.9579,
188
+ "eval_samples_per_second": 14.303,
189
+ "eval_steps_per_second": 1.43,
190
+ "step": 1000
191
  },
192
  {
193
+ "epoch": 5.25,
194
+ "grad_norm": 0.41983741521835327,
195
+ "learning_rate": 7.3775e-05,
196
+ "loss": 0.2649,
197
+ "step": 1050
198
+ },
199
+ {
200
+ "epoch": 5.5,
201
+ "grad_norm": 0.3734280467033386,
202
+ "learning_rate": 7.2525e-05,
203
+ "loss": 0.2661,
204
+ "step": 1100
205
+ },
206
+ {
207
+ "epoch": 5.75,
208
+ "grad_norm": 0.40924379229545593,
209
+ "learning_rate": 7.1275e-05,
210
+ "loss": 0.2659,
211
+ "step": 1150
212
+ },
213
+ {
214
+ "epoch": 6.0,
215
+ "grad_norm": 0.3856299817562103,
216
+ "learning_rate": 7.002500000000001e-05,
217
+ "loss": 0.2656,
218
+ "step": 1200
219
  },
220
  {
221
  "epoch": 6.0,
222
+ "eval_loss": 0.28378918766975403,
223
+ "eval_runtime": 34.9635,
224
+ "eval_samples_per_second": 14.301,
225
+ "eval_steps_per_second": 1.43,
226
+ "step": 1200
227
  },
228
  {
229
  "epoch": 6.25,
230
+ "grad_norm": 0.5564974546432495,
231
+ "learning_rate": 6.8775e-05,
232
+ "loss": 0.2628,
233
+ "step": 1250
234
  },
235
  {
236
+ "epoch": 6.5,
237
+ "grad_norm": 0.47460582852363586,
238
+ "learning_rate": 6.7525e-05,
239
+ "loss": 0.2632,
240
+ "step": 1300
241
+ },
242
+ {
243
+ "epoch": 6.75,
244
+ "grad_norm": 0.39553120732307434,
245
+ "learning_rate": 6.6275e-05,
246
+ "loss": 0.2637,
247
+ "step": 1350
248
+ },
249
+ {
250
+ "epoch": 7.0,
251
+ "grad_norm": 0.3899823725223541,
252
+ "learning_rate": 6.502500000000001e-05,
253
+ "loss": 0.2643,
254
+ "step": 1400
255
  },
256
  {
257
  "epoch": 7.0,
258
+ "eval_loss": 0.28401800990104675,
259
+ "eval_runtime": 34.97,
260
+ "eval_samples_per_second": 14.298,
261
+ "eval_steps_per_second": 1.43,
262
+ "step": 1400
263
+ },
264
+ {
265
+ "epoch": 7.25,
266
+ "grad_norm": 0.4034035801887512,
267
+ "learning_rate": 6.3775e-05,
268
+ "loss": 0.2609,
269
+ "step": 1450
270
  },
271
  {
272
  "epoch": 7.5,
273
+ "grad_norm": 0.3818899691104889,
274
+ "learning_rate": 6.2525e-05,
275
+ "loss": 0.2622,
276
+ "step": 1500
277
+ },
278
+ {
279
+ "epoch": 7.75,
280
+ "grad_norm": 0.5591850876808167,
281
+ "learning_rate": 6.1275e-05,
282
+ "loss": 0.2614,
283
+ "step": 1550
284
  },
285
  {
286
  "epoch": 8.0,
287
+ "grad_norm": 0.38951823115348816,
288
+ "learning_rate": 6.0024999999999995e-05,
289
+ "loss": 0.2628,
290
+ "step": 1600
 
291
  },
292
  {
293
+ "epoch": 8.0,
294
+ "eval_loss": 0.2841391861438751,
295
+ "eval_runtime": 34.9774,
296
+ "eval_samples_per_second": 14.295,
297
+ "eval_steps_per_second": 1.429,
298
+ "step": 1600
299
+ },
300
+ {
301
+ "epoch": 8.25,
302
+ "grad_norm": 0.4582635164260864,
303
+ "learning_rate": 5.8775000000000006e-05,
304
+ "loss": 0.258,
305
+ "step": 1650
306
+ },
307
+ {
308
+ "epoch": 8.5,
309
+ "grad_norm": 0.44102761149406433,
310
+ "learning_rate": 5.752500000000001e-05,
311
+ "loss": 0.2584,
312
+ "step": 1700
313
  },
314
  {
315
  "epoch": 8.75,
316
+ "grad_norm": 0.44510433077812195,
317
+ "learning_rate": 5.6275e-05,
318
+ "loss": 0.2589,
319
+ "step": 1750
320
+ },
321
+ {
322
+ "epoch": 9.0,
323
+ "grad_norm": 0.41570335626602173,
324
+ "learning_rate": 5.5025e-05,
325
+ "loss": 0.2603,
326
+ "step": 1800
327
  },
328
  {
329
  "epoch": 9.0,
330
+ "eval_loss": 0.2841310203075409,
331
+ "eval_runtime": 34.9715,
332
+ "eval_samples_per_second": 14.297,
333
+ "eval_steps_per_second": 1.43,
334
+ "step": 1800
335
+ },
336
+ {
337
+ "epoch": 9.25,
338
+ "grad_norm": 0.45228293538093567,
339
+ "learning_rate": 5.3775e-05,
340
+ "loss": 0.2542,
341
+ "step": 1850
342
+ },
343
+ {
344
+ "epoch": 9.5,
345
+ "grad_norm": 0.45636293292045593,
346
+ "learning_rate": 5.2525e-05,
347
+ "loss": 0.256,
348
+ "step": 1900
349
+ },
350
+ {
351
+ "epoch": 9.75,
352
+ "grad_norm": 0.4527854919433594,
353
+ "learning_rate": 5.1275000000000006e-05,
354
+ "loss": 0.2575,
355
+ "step": 1950
356
+ },
357
+ {
358
+ "epoch": 10.0,
359
+ "grad_norm": 0.4357495605945587,
360
+ "learning_rate": 5.0025e-05,
361
+ "loss": 0.2567,
362
+ "step": 2000
363
+ },
364
+ {
365
+ "epoch": 10.0,
366
+ "eval_loss": 0.28737518191337585,
367
+ "eval_runtime": 34.9799,
368
+ "eval_samples_per_second": 14.294,
369
+ "eval_steps_per_second": 1.429,
370
+ "step": 2000
371
+ },
372
+ {
373
+ "epoch": 10.25,
374
+ "grad_norm": 0.481499046087265,
375
+ "learning_rate": 4.8775000000000007e-05,
376
+ "loss": 0.2509,
377
+ "step": 2050
378
+ },
379
+ {
380
+ "epoch": 10.5,
381
+ "grad_norm": 0.474854439496994,
382
+ "learning_rate": 4.7525e-05,
383
+ "loss": 0.2522,
384
+ "step": 2100
385
+ },
386
+ {
387
+ "epoch": 10.75,
388
+ "grad_norm": 0.4796244204044342,
389
+ "learning_rate": 4.6275e-05,
390
+ "loss": 0.2538,
391
+ "step": 2150
392
+ },
393
+ {
394
+ "epoch": 11.0,
395
+ "grad_norm": 0.4484533667564392,
396
+ "learning_rate": 4.5025000000000003e-05,
397
+ "loss": 0.2542,
398
+ "step": 2200
399
+ },
400
+ {
401
+ "epoch": 11.0,
402
+ "eval_loss": 0.2882769703865051,
403
+ "eval_runtime": 34.9674,
404
+ "eval_samples_per_second": 14.299,
405
+ "eval_steps_per_second": 1.43,
406
+ "step": 2200
407
+ },
408
+ {
409
+ "epoch": 11.25,
410
+ "grad_norm": 0.601557195186615,
411
+ "learning_rate": 4.3775e-05,
412
+ "loss": 0.2463,
413
+ "step": 2250
414
+ },
415
+ {
416
+ "epoch": 11.5,
417
+ "grad_norm": 0.5119094848632812,
418
+ "learning_rate": 4.2525000000000004e-05,
419
+ "loss": 0.2485,
420
+ "step": 2300
421
+ },
422
+ {
423
+ "epoch": 11.75,
424
+ "grad_norm": 0.5228874683380127,
425
+ "learning_rate": 4.1275e-05,
426
+ "loss": 0.2493,
427
+ "step": 2350
428
+ },
429
+ {
430
+ "epoch": 12.0,
431
+ "grad_norm": 0.4916130304336548,
432
+ "learning_rate": 4.0025000000000004e-05,
433
+ "loss": 0.2498,
434
+ "step": 2400
435
+ },
436
+ {
437
+ "epoch": 12.0,
438
+ "eval_loss": 0.29168301820755005,
439
+ "eval_runtime": 34.9775,
440
+ "eval_samples_per_second": 14.295,
441
+ "eval_steps_per_second": 1.429,
442
+ "step": 2400
443
+ },
444
+ {
445
+ "epoch": 12.25,
446
+ "grad_norm": 0.5594790577888489,
447
+ "learning_rate": 3.8775e-05,
448
+ "loss": 0.2402,
449
+ "step": 2450
450
+ },
451
+ {
452
+ "epoch": 12.5,
453
+ "grad_norm": 0.5535025000572205,
454
+ "learning_rate": 3.7525e-05,
455
+ "loss": 0.2432,
456
+ "step": 2500
457
+ },
458
+ {
459
+ "epoch": 12.75,
460
+ "grad_norm": 0.6284328699111938,
461
+ "learning_rate": 3.6275e-05,
462
+ "loss": 0.245,
463
+ "step": 2550
464
+ },
465
+ {
466
+ "epoch": 13.0,
467
+ "grad_norm": 0.541262149810791,
468
+ "learning_rate": 3.5025000000000004e-05,
469
+ "loss": 0.2454,
470
+ "step": 2600
471
+ },
472
+ {
473
+ "epoch": 13.0,
474
+ "eval_loss": 0.29390111565589905,
475
+ "eval_runtime": 34.9798,
476
+ "eval_samples_per_second": 14.294,
477
+ "eval_steps_per_second": 1.429,
478
+ "step": 2600
479
+ },
480
+ {
481
+ "epoch": 13.25,
482
+ "grad_norm": 0.6019225716590881,
483
+ "learning_rate": 3.3775e-05,
484
+ "loss": 0.2341,
485
+ "step": 2650
486
+ },
487
+ {
488
+ "epoch": 13.5,
489
+ "grad_norm": 0.5700891017913818,
490
+ "learning_rate": 3.2525e-05,
491
+ "loss": 0.2374,
492
+ "step": 2700
493
+ },
494
+ {
495
+ "epoch": 13.75,
496
+ "grad_norm": 0.6702916622161865,
497
+ "learning_rate": 3.1275e-05,
498
+ "loss": 0.2378,
499
+ "step": 2750
500
+ },
501
+ {
502
+ "epoch": 14.0,
503
+ "grad_norm": 0.6081238985061646,
504
+ "learning_rate": 3.0025000000000005e-05,
505
+ "loss": 0.2379,
506
+ "step": 2800
507
+ },
508
+ {
509
+ "epoch": 14.0,
510
+ "eval_loss": 0.29895922541618347,
511
+ "eval_runtime": 34.9629,
512
+ "eval_samples_per_second": 14.301,
513
+ "eval_steps_per_second": 1.43,
514
+ "step": 2800
515
+ },
516
+ {
517
+ "epoch": 14.25,
518
+ "grad_norm": 0.9264949560165405,
519
+ "learning_rate": 2.8775e-05,
520
+ "loss": 0.2259,
521
+ "step": 2850
522
+ },
523
+ {
524
+ "epoch": 14.5,
525
+ "grad_norm": 0.6585692763328552,
526
+ "learning_rate": 2.7525e-05,
527
+ "loss": 0.2282,
528
+ "step": 2900
529
+ },
530
+ {
531
+ "epoch": 14.75,
532
+ "grad_norm": 0.698367714881897,
533
+ "learning_rate": 2.6275e-05,
534
+ "loss": 0.2303,
535
+ "step": 2950
536
+ },
537
+ {
538
+ "epoch": 15.0,
539
+ "grad_norm": 0.6891688108444214,
540
+ "learning_rate": 2.5025e-05,
541
+ "loss": 0.2298,
542
+ "step": 3000
543
+ },
544
+ {
545
+ "epoch": 15.0,
546
+ "eval_loss": 0.30702412128448486,
547
+ "eval_runtime": 34.9664,
548
+ "eval_samples_per_second": 14.299,
549
+ "eval_steps_per_second": 1.43,
550
+ "step": 3000
551
+ },
552
+ {
553
+ "epoch": 15.25,
554
+ "grad_norm": 0.7823687195777893,
555
+ "learning_rate": 2.3775e-05,
556
+ "loss": 0.2167,
557
+ "step": 3050
558
+ },
559
+ {
560
+ "epoch": 15.5,
561
+ "grad_norm": 0.7410340905189514,
562
+ "learning_rate": 2.2525000000000002e-05,
563
+ "loss": 0.2188,
564
+ "step": 3100
565
+ },
566
+ {
567
+ "epoch": 15.75,
568
+ "grad_norm": 0.815812349319458,
569
+ "learning_rate": 2.1275000000000002e-05,
570
+ "loss": 0.2204,
571
+ "step": 3150
572
+ },
573
+ {
574
+ "epoch": 16.0,
575
+ "grad_norm": 1.0224624872207642,
576
+ "learning_rate": 2.0025000000000002e-05,
577
+ "loss": 0.2215,
578
+ "step": 3200
579
+ },
580
+ {
581
+ "epoch": 16.0,
582
+ "eval_loss": 0.31381356716156006,
583
+ "eval_runtime": 34.9748,
584
+ "eval_samples_per_second": 14.296,
585
+ "eval_steps_per_second": 1.43,
586
+ "step": 3200
587
+ },
588
+ {
589
+ "epoch": 16.25,
590
+ "grad_norm": 0.8359068632125854,
591
+ "learning_rate": 1.8775000000000002e-05,
592
+ "loss": 0.2066,
593
+ "step": 3250
594
+ },
595
+ {
596
+ "epoch": 16.5,
597
+ "grad_norm": 0.8485832214355469,
598
+ "learning_rate": 1.7525e-05,
599
+ "loss": 0.2086,
600
+ "step": 3300
601
+ },
602
+ {
603
+ "epoch": 16.75,
604
+ "grad_norm": 0.9015569686889648,
605
+ "learning_rate": 1.6275000000000003e-05,
606
+ "loss": 0.2098,
607
+ "step": 3350
608
+ },
609
+ {
610
+ "epoch": 17.0,
611
+ "grad_norm": 0.85235595703125,
612
+ "learning_rate": 1.5025000000000001e-05,
613
+ "loss": 0.2101,
614
+ "step": 3400
615
+ },
616
+ {
617
+ "epoch": 17.0,
618
+ "eval_loss": 0.321372389793396,
619
+ "eval_runtime": 34.9534,
620
+ "eval_samples_per_second": 14.305,
621
+ "eval_steps_per_second": 1.43,
622
+ "step": 3400
623
+ },
624
+ {
625
+ "epoch": 17.25,
626
+ "grad_norm": 0.9530749917030334,
627
+ "learning_rate": 1.3775000000000001e-05,
628
+ "loss": 0.1951,
629
+ "step": 3450
630
+ },
631
+ {
632
+ "epoch": 17.5,
633
+ "grad_norm": 0.9698864817619324,
634
+ "learning_rate": 1.2525000000000001e-05,
635
+ "loss": 0.1972,
636
+ "step": 3500
637
+ },
638
+ {
639
+ "epoch": 17.75,
640
+ "grad_norm": 0.9734402894973755,
641
+ "learning_rate": 1.1275000000000001e-05,
642
+ "loss": 0.1982,
643
+ "step": 3550
644
+ },
645
+ {
646
+ "epoch": 18.0,
647
+ "grad_norm": 0.9597378373146057,
648
+ "learning_rate": 1.0025000000000001e-05,
649
+ "loss": 0.2002,
650
+ "step": 3600
651
+ },
652
+ {
653
+ "epoch": 18.0,
654
+ "eval_loss": 0.3291071653366089,
655
+ "eval_runtime": 34.9444,
656
+ "eval_samples_per_second": 14.308,
657
+ "eval_steps_per_second": 1.431,
658
+ "step": 3600
659
+ },
660
+ {
661
+ "epoch": 18.25,
662
+ "grad_norm": 0.9978246688842773,
663
+ "learning_rate": 8.775e-06,
664
+ "loss": 0.1851,
665
+ "step": 3650
666
+ },
667
+ {
668
+ "epoch": 18.5,
669
+ "grad_norm": 1.0513160228729248,
670
+ "learning_rate": 7.525e-06,
671
+ "loss": 0.1874,
672
+ "step": 3700
673
+ },
674
+ {
675
+ "epoch": 18.75,
676
+ "grad_norm": 1.0297590494155884,
677
+ "learning_rate": 6.275e-06,
678
+ "loss": 0.1876,
679
+ "step": 3750
680
+ },
681
+ {
682
+ "epoch": 19.0,
683
+ "grad_norm": 1.0286418199539185,
684
+ "learning_rate": 5.025e-06,
685
+ "loss": 0.1883,
686
+ "step": 3800
687
+ },
688
+ {
689
+ "epoch": 19.0,
690
+ "eval_loss": 0.3394256830215454,
691
+ "eval_runtime": 34.9405,
692
+ "eval_samples_per_second": 14.31,
693
+ "eval_steps_per_second": 1.431,
694
+ "step": 3800
695
+ },
696
+ {
697
+ "epoch": 19.25,
698
+ "grad_norm": 1.0999506711959839,
699
+ "learning_rate": 3.775e-06,
700
+ "loss": 0.1794,
701
+ "step": 3850
702
+ },
703
+ {
704
+ "epoch": 19.5,
705
+ "grad_norm": 1.1211074590682983,
706
+ "learning_rate": 2.5250000000000004e-06,
707
+ "loss": 0.1784,
708
+ "step": 3900
709
+ },
710
+ {
711
+ "epoch": 19.75,
712
+ "grad_norm": 1.0398590564727783,
713
+ "learning_rate": 1.275e-06,
714
+ "loss": 0.1781,
715
+ "step": 3950
716
+ },
717
+ {
718
+ "epoch": 20.0,
719
+ "grad_norm": 1.1269780397415161,
720
+ "learning_rate": 2.5000000000000002e-08,
721
+ "loss": 0.1784,
722
+ "step": 4000
723
+ },
724
+ {
725
+ "epoch": 20.0,
726
+ "eval_loss": 0.34511706233024597,
727
+ "eval_runtime": 34.9533,
728
+ "eval_samples_per_second": 14.305,
729
+ "eval_steps_per_second": 1.43,
730
+ "step": 4000
731
  }
732
  ],
733
  "logging_steps": 50,
734
+ "max_steps": 4000,
735
  "num_input_tokens_seen": 0,
736
  "num_train_epochs": 20,
737
  "save_steps": 500,
 
742
  "should_evaluate": false,
743
  "should_log": false,
744
  "should_save": true,
745
+ "should_training_stop": true
746
  },
747
  "attributes": {}
748
  }
749
  },
750
+ "total_flos": 2.073044089536123e+18,
751
  "train_batch_size": 10,
752
  "trial_name": null,
753
  "trial_params": null
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1ccfc48d5bf5d433a7901c2e32bcd41977a609c84a21fd4563bfa19f725b9990
3
- size 5240
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f0c477509df8159054dc811265d4f27c236cef7ab115b1572faaba91e539c09f
3
+ size 5304