Initial push

Browse files

Files changed (13) hide show

README.md +209 -0
adapter_config.json +41 -0
adapter_model.safetensors +3 -0
chat_template.jinja +87 -0
optimizer.pt +3 -0
rng_state.pth +3 -0
scheduler.pt +3 -0
special_tokens_map.json +17 -0
tokenizer.json +0 -0
tokenizer.model +3 -0
tokenizer_config.json +0 -0
trainer_state.json +871 -0
training_args.bin +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,209 @@

+---
+base_model: mistralai/Mistral-7B-Instruct-v0.3
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:mistralai/Mistral-7B-Instruct-v0.3
+- lora
+- sft
+- transformers
+- trl
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.16.0

adapter_config.json ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "mistralai/Mistral-7B-Instruct-v0.3",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 64,
+  "lora_bias": false,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "up_proj",
+    "o_proj",
+    "down_proj",
+    "v_proj",
+    "k_proj",
+    "q_proj",
+    "gate_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:34c3324138e7ddae147a9d378a2e46226aa95fba929cff69c84c3f506e7f7356
+size 335604696

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,87 @@

+{%- if messages[0]["role"] == "system" %}
+    {%- set system_message = messages[0]["content"] %}
+    {%- set loop_messages = messages[1:] %}
+{%- else %}
+    {%- set loop_messages = messages %}
+{%- endif %}
+{%- if not tools is defined %}
+    {%- set tools = none %}
+{%- endif %}
+{%- set user_messages = loop_messages | selectattr("role", "equalto", "user") | list %}
+{#- This block checks for alternating user/assistant messages, skipping tool calling messages #}
+{%- set ns = namespace() %}
+{%- set ns.index = 0 %}
+{%- for message in loop_messages %}
+    {%- if not (message.role == "tool" or message.role == "tool_results" or (message.tool_calls is defined and message.tool_calls is not none)) %}
+        {%- if (message["role"] == "user") != (ns.index % 2 == 0) %}
+            {{- raise_exception("After the optional system message, conversation roles must alternate user/assistant/user/assistant/...") }}
+        {%- endif %}
+        {%- set ns.index = ns.index + 1 %}
+    {%- endif %}
+{%- endfor %}
+{{- bos_token }}
+{%- for message in loop_messages %}
+    {%- if message["role"] == "user" %}
+        {%- if tools is not none and (message == user_messages[-1]) %}
+            {{- "[AVAILABLE_TOOLS] [" }}
+            {%- for tool in tools %}
+                {%- set tool = tool.function %}
+                {{- '{"type": "function", "function": {' }}
+                {%- for key, val in tool.items() if key != "return" %}
+                    {%- if val is string %}
+                        {{- '"' + key + '": "' + val + '"' }}
+                    {%- else %}
+                        {{- '"' + key + '": ' + val|tojson }}
+                    {%- endif %}
+                    {%- if not loop.last %}
+                        {{- ", " }}
+                    {%- endif %}
+                {%- endfor %}
+                {{- "}}" }}
+                {%- if not loop.last %}
+                    {{- ", " }}
+                {%- else %}
+                    {{- "]" }}
+                {%- endif %}
+            {%- endfor %}
+            {{- "[/AVAILABLE_TOOLS]" }}
+            {%- endif %}
+        {%- if loop.last and system_message is defined %}
+            {{- "[INST] " + system_message + "\n\n" + message["content"] + "[/INST]" }}
+        {%- else %}
+            {{- "[INST] " + message["content"] + "[/INST]" }}
+        {%- endif %}
+    {%- elif message.tool_calls is defined and message.tool_calls is not none %}
+        {{- "[TOOL_CALLS] [" }}
+        {%- for tool_call in message.tool_calls %}
+            {%- set out = tool_call.function|tojson %}
+            {{- out[:-1] }}
+            {%- if not tool_call.id is defined or tool_call.id|length != 9 %}
+                {{- raise_exception("Tool call IDs should be alphanumeric strings with length 9!") }}
+            {%- endif %}
+            {{- ', "id": "' + tool_call.id + '"}' }}
+            {%- if not loop.last %}
+                {{- ", " }}
+            {%- else %}
+                {{- "]" + eos_token }}
+            {%- endif %}
+        {%- endfor %}
+    {%- elif message["role"] == "assistant" %}
+        {{- " " + message["content"]|trim + eos_token}}
+    {%- elif message["role"] == "tool_results" or message["role"] == "tool" %}
+        {%- if message.content is defined and message.content.content is defined %}
+            {%- set content = message.content.content %}
+        {%- else %}
+            {%- set content = message.content %}
+        {%- endif %}
+        {{- '[TOOL_RESULTS] {"content": ' + content|string + ", " }}
+        {%- if not message.tool_call_id is defined or message.tool_call_id|length != 9 %}
+            {{- raise_exception("Tool call IDs should be alphanumeric strings with length 9!") }}
+        {%- endif %}
+        {{- '"call_id": "' + message.tool_call_id + '"}[/TOOL_RESULTS]' }}
+    {%- else %}
+        {{- raise_exception("Only user and assistant roles are supported, with the exception of an initial optional system message!") }}
+    {%- endif %}
+{%- endfor %}

optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1182c3661d1975fbccf38ad2a3194a205e0e88b4233a81ce3271929d0b19fd66
+size 671467171

rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:12e15e837284f30841feeb4cb11a4ca47e6e0a0d43907e64044c865959176390
+size 14581

scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e218fb5a057c8bad3409875137fd669abfbb61a30b591edea9a460e390fbcd1c
+size 1465

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,17 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": "</s>",
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:37f00374dea48658ee8f5d0f21895b9bc55cb0103939607c8185bfd1c6ca1f89
+size 587404

tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

trainer_state.json ADDED Viewed

	@@ -0,0 +1,871 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.0,
+  "eval_steps": 500,
+  "global_step": 375,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.010666666666666666,
+      "grad_norm": 57.01100540161133,
+      "learning_rate": 9.920000000000002e-06,
+      "loss": 3.3007,
+      "mean_token_accuracy": 0.35129059516475536,
+      "num_tokens": 131072.0,
+      "step": 4
+    },
+    {
+      "epoch": 0.021333333333333333,
+      "grad_norm": 22.87227439880371,
+      "learning_rate": 9.813333333333333e-06,
+      "loss": 0.7338,
+      "mean_token_accuracy": 0.8888531178236008,
+      "num_tokens": 262144.0,
+      "step": 8
+    },
+    {
+      "epoch": 0.032,
+      "grad_norm": 31.548667907714844,
+      "learning_rate": 9.706666666666668e-06,
+      "loss": 0.3615,
+      "mean_token_accuracy": 0.9366130772978067,
+      "num_tokens": 393216.0,
+      "step": 12
+    },
+    {
+      "epoch": 0.042666666666666665,
+      "grad_norm": 2.3477976322174072,
+      "learning_rate": 9.600000000000001e-06,
+      "loss": 0.2559,
+      "mean_token_accuracy": 0.9562490303069353,
+      "num_tokens": 524288.0,
+      "step": 16
+    },
+    {
+      "epoch": 0.05333333333333334,
+      "grad_norm": 2.8429372310638428,
+      "learning_rate": 9.493333333333334e-06,
+      "loss": 0.2029,
+      "mean_token_accuracy": 0.9619030933827162,
+      "num_tokens": 655360.0,
+      "step": 20
+    },
+    {
+      "epoch": 0.064,
+      "grad_norm": 0.751723051071167,
+      "learning_rate": 9.386666666666668e-06,
+      "loss": 0.1878,
+      "mean_token_accuracy": 0.9634499195963144,
+      "num_tokens": 786432.0,
+      "step": 24
+    },
+    {
+      "epoch": 0.07466666666666667,
+      "grad_norm": 0.8143905997276306,
+      "learning_rate": 9.280000000000001e-06,
+      "loss": 0.1609,
+      "mean_token_accuracy": 0.9677910767495632,
+      "num_tokens": 917504.0,
+      "step": 28
+    },
+    {
+      "epoch": 0.08533333333333333,
+      "grad_norm": 0.5676524043083191,
+      "learning_rate": 9.173333333333334e-06,
+      "loss": 0.1498,
+      "mean_token_accuracy": 0.9682109300047159,
+      "num_tokens": 1048576.0,
+      "step": 32
+    },
+    {
+      "epoch": 0.096,
+      "grad_norm": 0.4019247591495514,
+      "learning_rate": 9.066666666666667e-06,
+      "loss": 0.1198,
+      "mean_token_accuracy": 0.973953602835536,
+      "num_tokens": 1179648.0,
+      "step": 36
+    },
+    {
+      "epoch": 0.10666666666666667,
+      "grad_norm": 2.094287633895874,
+      "learning_rate": 8.96e-06,
+      "loss": 0.1245,
+      "mean_token_accuracy": 0.9717862568795681,
+      "num_tokens": 1310720.0,
+      "step": 40
+    },
+    {
+      "epoch": 0.11733333333333333,
+      "grad_norm": 0.4548497498035431,
+      "learning_rate": 8.853333333333334e-06,
+      "loss": 0.1072,
+      "mean_token_accuracy": 0.9756607748568058,
+      "num_tokens": 1441792.0,
+      "step": 44
+    },
+    {
+      "epoch": 0.128,
+      "grad_norm": 0.4056798815727234,
+      "learning_rate": 8.746666666666667e-06,
+      "loss": 0.1117,
+      "mean_token_accuracy": 0.9738466665148735,
+      "num_tokens": 1572864.0,
+      "step": 48
+    },
+    {
+      "epoch": 0.13866666666666666,
+      "grad_norm": 0.46949484944343567,
+      "learning_rate": 8.64e-06,
+      "loss": 0.0993,
+      "mean_token_accuracy": 0.9766117427498102,
+      "num_tokens": 1703936.0,
+      "step": 52
+    },
+    {
+      "epoch": 0.14933333333333335,
+      "grad_norm": 0.5640748143196106,
+      "learning_rate": 8.533333333333335e-06,
+      "loss": 0.1021,
+      "mean_token_accuracy": 0.9750698395073414,
+      "num_tokens": 1835008.0,
+      "step": 56
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 0.5425636172294617,
+      "learning_rate": 8.426666666666667e-06,
+      "loss": 0.0941,
+      "mean_token_accuracy": 0.9774306155741215,
+      "num_tokens": 1966080.0,
+      "step": 60
+    },
+    {
+      "epoch": 0.17066666666666666,
+      "grad_norm": 0.4611673057079315,
+      "learning_rate": 8.32e-06,
+      "loss": 0.1007,
+      "mean_token_accuracy": 0.9752767980098724,
+      "num_tokens": 2097152.0,
+      "step": 64
+    },
+    {
+      "epoch": 0.18133333333333335,
+      "grad_norm": 0.3594709634780884,
+      "learning_rate": 8.213333333333335e-06,
+      "loss": 0.0918,
+      "mean_token_accuracy": 0.9777762982994318,
+      "num_tokens": 2228224.0,
+      "step": 68
+    },
+    {
+      "epoch": 0.192,
+      "grad_norm": 0.4486537575721741,
+      "learning_rate": 8.106666666666666e-06,
+      "loss": 0.0805,
+      "mean_token_accuracy": 0.9807284362614155,
+      "num_tokens": 2359296.0,
+      "step": 72
+    },
+    {
+      "epoch": 0.20266666666666666,
+      "grad_norm": 0.38870957493782043,
+      "learning_rate": 8.000000000000001e-06,
+      "loss": 0.0868,
+      "mean_token_accuracy": 0.97887940146029,
+      "num_tokens": 2490368.0,
+      "step": 76
+    },
+    {
+      "epoch": 0.21333333333333335,
+      "grad_norm": 0.30872026085853577,
+      "learning_rate": 7.893333333333335e-06,
+      "loss": 0.0797,
+      "mean_token_accuracy": 0.9803172368556261,
+      "num_tokens": 2621440.0,
+      "step": 80
+    },
+    {
+      "epoch": 0.224,
+      "grad_norm": 0.3934505581855774,
+      "learning_rate": 7.786666666666666e-06,
+      "loss": 0.0833,
+      "mean_token_accuracy": 0.9801280535757542,
+      "num_tokens": 2752512.0,
+      "step": 84
+    },
+    {
+      "epoch": 0.23466666666666666,
+      "grad_norm": 0.4293346107006073,
+      "learning_rate": 7.680000000000001e-06,
+      "loss": 0.0731,
+      "mean_token_accuracy": 0.9823248106986284,
+      "num_tokens": 2883584.0,
+      "step": 88
+    },
+    {
+      "epoch": 0.24533333333333332,
+      "grad_norm": 0.4050430655479431,
+      "learning_rate": 7.573333333333333e-06,
+      "loss": 0.0669,
+      "mean_token_accuracy": 0.9841717593371868,
+      "num_tokens": 3014656.0,
+      "step": 92
+    },
+    {
+      "epoch": 0.256,
+      "grad_norm": 0.4589104652404785,
+      "learning_rate": 7.4666666666666675e-06,
+      "loss": 0.0693,
+      "mean_token_accuracy": 0.9830129705369473,
+      "num_tokens": 3145728.0,
+      "step": 96
+    },
+    {
+      "epoch": 0.26666666666666666,
+      "grad_norm": 0.36902204155921936,
+      "learning_rate": 7.360000000000001e-06,
+      "loss": 0.0742,
+      "mean_token_accuracy": 0.9817552175372839,
+      "num_tokens": 3276800.0,
+      "step": 100
+    },
+    {
+      "epoch": 0.2773333333333333,
+      "grad_norm": 0.5726522207260132,
+      "learning_rate": 7.253333333333335e-06,
+      "loss": 0.0647,
+      "mean_token_accuracy": 0.984769131988287,
+      "num_tokens": 3407872.0,
+      "step": 104
+    },
+    {
+      "epoch": 0.288,
+      "grad_norm": 0.2904077470302582,
+      "learning_rate": 7.146666666666667e-06,
+      "loss": 0.0699,
+      "mean_token_accuracy": 0.9826978966593742,
+      "num_tokens": 3538944.0,
+      "step": 108
+    },
+    {
+      "epoch": 0.2986666666666667,
+      "grad_norm": 0.4607014060020447,
+      "learning_rate": 7.04e-06,
+      "loss": 0.0636,
+      "mean_token_accuracy": 0.9845926780253649,
+      "num_tokens": 3670016.0,
+      "step": 112
+    },
+    {
+      "epoch": 0.30933333333333335,
+      "grad_norm": 0.3433464467525482,
+      "learning_rate": 6.9333333333333344e-06,
+      "loss": 0.0676,
+      "mean_token_accuracy": 0.9838831946253777,
+      "num_tokens": 3801088.0,
+      "step": 116
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 0.3320010304450989,
+      "learning_rate": 6.826666666666667e-06,
+      "loss": 0.0613,
+      "mean_token_accuracy": 0.9851957242935896,
+      "num_tokens": 3932160.0,
+      "step": 120
+    },
+    {
+      "epoch": 0.33066666666666666,
+      "grad_norm": 0.3194180428981781,
+      "learning_rate": 6.720000000000001e-06,
+      "loss": 0.0615,
+      "mean_token_accuracy": 0.9851196780800819,
+      "num_tokens": 4063232.0,
+      "step": 124
+    },
+    {
+      "epoch": 0.3413333333333333,
+      "grad_norm": 0.3178844153881073,
+      "learning_rate": 6.613333333333334e-06,
+      "loss": 0.0602,
+      "mean_token_accuracy": 0.9855002630501986,
+      "num_tokens": 4194304.0,
+      "step": 128
+    },
+    {
+      "epoch": 0.352,
+      "grad_norm": 0.3220680058002472,
+      "learning_rate": 6.5066666666666665e-06,
+      "loss": 0.0603,
+      "mean_token_accuracy": 0.9854460209608078,
+      "num_tokens": 4325376.0,
+      "step": 132
+    },
+    {
+      "epoch": 0.3626666666666667,
+      "grad_norm": 0.3115042448043823,
+      "learning_rate": 6.4000000000000006e-06,
+      "loss": 0.0642,
+      "mean_token_accuracy": 0.9842616058886051,
+      "num_tokens": 4456448.0,
+      "step": 136
+    },
+    {
+      "epoch": 0.37333333333333335,
+      "grad_norm": 0.29545482993125916,
+      "learning_rate": 6.293333333333334e-06,
+      "loss": 0.0632,
+      "mean_token_accuracy": 0.9844904001802206,
+      "num_tokens": 4587520.0,
+      "step": 140
+    },
+    {
+      "epoch": 0.384,
+      "grad_norm": 0.2917477786540985,
+      "learning_rate": 6.186666666666668e-06,
+      "loss": 0.0633,
+      "mean_token_accuracy": 0.9846063815057278,
+      "num_tokens": 4718592.0,
+      "step": 144
+    },
+    {
+      "epoch": 0.39466666666666667,
+      "grad_norm": 0.2835116982460022,
+      "learning_rate": 6.08e-06,
+      "loss": 0.0645,
+      "mean_token_accuracy": 0.9845091681927443,
+      "num_tokens": 4849664.0,
+      "step": 148
+    },
+    {
+      "epoch": 0.4053333333333333,
+      "grad_norm": 0.5664941072463989,
+      "learning_rate": 5.973333333333334e-06,
+      "loss": 0.0637,
+      "mean_token_accuracy": 0.9839890114963055,
+      "num_tokens": 4980736.0,
+      "step": 152
+    },
+    {
+      "epoch": 0.416,
+      "grad_norm": 0.31269532442092896,
+      "learning_rate": 5.8666666666666675e-06,
+      "loss": 0.0629,
+      "mean_token_accuracy": 0.9847182333469391,
+      "num_tokens": 5111808.0,
+      "step": 156
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "grad_norm": 0.23279497027397156,
+      "learning_rate": 5.76e-06,
+      "loss": 0.0634,
+      "mean_token_accuracy": 0.9844169486314058,
+      "num_tokens": 5242880.0,
+      "step": 160
+    },
+    {
+      "epoch": 0.43733333333333335,
+      "grad_norm": 0.3166351616382599,
+      "learning_rate": 5.653333333333334e-06,
+      "loss": 0.0591,
+      "mean_token_accuracy": 0.9855970963835716,
+      "num_tokens": 5373952.0,
+      "step": 164
+    },
+    {
+      "epoch": 0.448,
+      "grad_norm": 0.3708897531032562,
+      "learning_rate": 5.546666666666667e-06,
+      "loss": 0.0652,
+      "mean_token_accuracy": 0.9839488156139851,
+      "num_tokens": 5505024.0,
+      "step": 168
+    },
+    {
+      "epoch": 0.45866666666666667,
+      "grad_norm": 0.4217165410518646,
+      "learning_rate": 5.4400000000000004e-06,
+      "loss": 0.0609,
+      "mean_token_accuracy": 0.9852561354637146,
+      "num_tokens": 5636096.0,
+      "step": 172
+    },
+    {
+      "epoch": 0.4693333333333333,
+      "grad_norm": 0.25086694955825806,
+      "learning_rate": 5.333333333333334e-06,
+      "loss": 0.0583,
+      "mean_token_accuracy": 0.9855273645371199,
+      "num_tokens": 5767168.0,
+      "step": 176
+    },
+    {
+      "epoch": 0.48,
+      "grad_norm": 0.22864191234111786,
+      "learning_rate": 5.226666666666667e-06,
+      "loss": 0.0573,
+      "mean_token_accuracy": 0.9858840573579073,
+      "num_tokens": 5898240.0,
+      "step": 180
+    },
+    {
+      "epoch": 0.49066666666666664,
+      "grad_norm": 0.23961994051933289,
+      "learning_rate": 5.12e-06,
+      "loss": 0.0597,
+      "mean_token_accuracy": 0.985382217913866,
+      "num_tokens": 6029312.0,
+      "step": 184
+    },
+    {
+      "epoch": 0.5013333333333333,
+      "grad_norm": 0.33800604939460754,
+      "learning_rate": 5.013333333333333e-06,
+      "loss": 0.063,
+      "mean_token_accuracy": 0.9846932347863913,
+      "num_tokens": 6160384.0,
+      "step": 188
+    },
+    {
+      "epoch": 0.512,
+      "grad_norm": 0.25232720375061035,
+      "learning_rate": 4.9066666666666666e-06,
+      "loss": 0.0572,
+      "mean_token_accuracy": 0.9861765094101429,
+      "num_tokens": 6291456.0,
+      "step": 192
+    },
+    {
+      "epoch": 0.5226666666666666,
+      "grad_norm": 0.24498237669467926,
+      "learning_rate": 4.800000000000001e-06,
+      "loss": 0.0601,
+      "mean_token_accuracy": 0.9852175824344158,
+      "num_tokens": 6422528.0,
+      "step": 196
+    },
+    {
+      "epoch": 0.5333333333333333,
+      "grad_norm": 0.2261447161436081,
+      "learning_rate": 4.693333333333334e-06,
+      "loss": 0.0593,
+      "mean_token_accuracy": 0.985989149659872,
+      "num_tokens": 6553600.0,
+      "step": 200
+    },
+    {
+      "epoch": 0.544,
+      "grad_norm": 0.2790498435497284,
+      "learning_rate": 4.586666666666667e-06,
+      "loss": 0.0643,
+      "mean_token_accuracy": 0.9841120839118958,
+      "num_tokens": 6684672.0,
+      "step": 204
+    },
+    {
+      "epoch": 0.5546666666666666,
+      "grad_norm": 0.2930603623390198,
+      "learning_rate": 4.48e-06,
+      "loss": 0.0599,
+      "mean_token_accuracy": 0.9853447061032057,
+      "num_tokens": 6815744.0,
+      "step": 208
+    },
+    {
+      "epoch": 0.5653333333333334,
+      "grad_norm": 0.29962053894996643,
+      "learning_rate": 4.3733333333333335e-06,
+      "loss": 0.0616,
+      "mean_token_accuracy": 0.985086927190423,
+      "num_tokens": 6946816.0,
+      "step": 212
+    },
+    {
+      "epoch": 0.576,
+      "grad_norm": 0.26049676537513733,
+      "learning_rate": 4.266666666666668e-06,
+      "loss": 0.0605,
+      "mean_token_accuracy": 0.9852338843047619,
+      "num_tokens": 7077888.0,
+      "step": 216
+    },
+    {
+      "epoch": 0.5866666666666667,
+      "grad_norm": 0.24306631088256836,
+      "learning_rate": 4.16e-06,
+      "loss": 0.062,
+      "mean_token_accuracy": 0.9845094550400972,
+      "num_tokens": 7208960.0,
+      "step": 220
+    },
+    {
+      "epoch": 0.5973333333333334,
+      "grad_norm": 0.24954764544963837,
+      "learning_rate": 4.053333333333333e-06,
+      "loss": 0.0593,
+      "mean_token_accuracy": 0.9854345787316561,
+      "num_tokens": 7340032.0,
+      "step": 224
+    },
+    {
+      "epoch": 0.608,
+      "grad_norm": 0.24120154976844788,
+      "learning_rate": 3.946666666666667e-06,
+      "loss": 0.0536,
+      "mean_token_accuracy": 0.9870329741388559,
+      "num_tokens": 7471104.0,
+      "step": 228
+    },
+    {
+      "epoch": 0.6186666666666667,
+      "grad_norm": 0.3049578368663788,
+      "learning_rate": 3.8400000000000005e-06,
+      "loss": 0.0563,
+      "mean_token_accuracy": 0.9861820172518492,
+      "num_tokens": 7602176.0,
+      "step": 232
+    },
+    {
+      "epoch": 0.6293333333333333,
+      "grad_norm": 0.22811713814735413,
+      "learning_rate": 3.7333333333333337e-06,
+      "loss": 0.0555,
+      "mean_token_accuracy": 0.9866867158561945,
+      "num_tokens": 7733248.0,
+      "step": 236
+    },
+    {
+      "epoch": 0.64,
+      "grad_norm": 0.2259177416563034,
+      "learning_rate": 3.6266666666666674e-06,
+      "loss": 0.0603,
+      "mean_token_accuracy": 0.9851241298019886,
+      "num_tokens": 7864320.0,
+      "step": 240
+    },
+    {
+      "epoch": 0.6506666666666666,
+      "grad_norm": 0.23514041304588318,
+      "learning_rate": 3.52e-06,
+      "loss": 0.0567,
+      "mean_token_accuracy": 0.98579515889287,
+      "num_tokens": 7995392.0,
+      "step": 244
+    },
+    {
+      "epoch": 0.6613333333333333,
+      "grad_norm": 0.2601166367530823,
+      "learning_rate": 3.4133333333333334e-06,
+      "loss": 0.0581,
+      "mean_token_accuracy": 0.9856137670576572,
+      "num_tokens": 8126464.0,
+      "step": 248
+    },
+    {
+      "epoch": 0.672,
+      "grad_norm": 0.4092627763748169,
+      "learning_rate": 3.306666666666667e-06,
+      "loss": 0.0578,
+      "mean_token_accuracy": 0.985778022557497,
+      "num_tokens": 8257536.0,
+      "step": 252
+    },
+    {
+      "epoch": 0.6826666666666666,
+      "grad_norm": 0.22616159915924072,
+      "learning_rate": 3.2000000000000003e-06,
+      "loss": 0.0612,
+      "mean_token_accuracy": 0.9848431646823883,
+      "num_tokens": 8388608.0,
+      "step": 256
+    },
+    {
+      "epoch": 0.6933333333333334,
+      "grad_norm": 0.274242103099823,
+      "learning_rate": 3.093333333333334e-06,
+      "loss": 0.0584,
+      "mean_token_accuracy": 0.9856515768915415,
+      "num_tokens": 8519680.0,
+      "step": 260
+    },
+    {
+      "epoch": 0.704,
+      "grad_norm": 0.23342455923557281,
+      "learning_rate": 2.986666666666667e-06,
+      "loss": 0.0503,
+      "mean_token_accuracy": 0.9877575300633907,
+      "num_tokens": 8650752.0,
+      "step": 264
+    },
+    {
+      "epoch": 0.7146666666666667,
+      "grad_norm": 0.2760745584964752,
+      "learning_rate": 2.88e-06,
+      "loss": 0.0577,
+      "mean_token_accuracy": 0.9861479960381985,
+      "num_tokens": 8781824.0,
+      "step": 268
+    },
+    {
+      "epoch": 0.7253333333333334,
+      "grad_norm": 0.26444053649902344,
+      "learning_rate": 2.7733333333333336e-06,
+      "loss": 0.0589,
+      "mean_token_accuracy": 0.9855729006230831,
+      "num_tokens": 8912896.0,
+      "step": 272
+    },
+    {
+      "epoch": 0.736,
+      "grad_norm": 0.3007858693599701,
+      "learning_rate": 2.666666666666667e-06,
+      "loss": 0.0583,
+      "mean_token_accuracy": 0.9853702746331692,
+      "num_tokens": 9043968.0,
+      "step": 276
+    },
+    {
+      "epoch": 0.7466666666666667,
+      "grad_norm": 0.29359951615333557,
+      "learning_rate": 2.56e-06,
+      "loss": 0.0568,
+      "mean_token_accuracy": 0.9862662479281425,
+      "num_tokens": 9175040.0,
+      "step": 280
+    },
+    {
+      "epoch": 0.7573333333333333,
+      "grad_norm": 0.5619154572486877,
+      "learning_rate": 2.4533333333333333e-06,
+      "loss": 0.0576,
+      "mean_token_accuracy": 0.9858071394264698,
+      "num_tokens": 9306112.0,
+      "step": 284
+    },
+    {
+      "epoch": 0.768,
+      "grad_norm": 0.21443450450897217,
+      "learning_rate": 2.346666666666667e-06,
+      "loss": 0.0593,
+      "mean_token_accuracy": 0.985109519213438,
+      "num_tokens": 9437184.0,
+      "step": 288
+    },
+    {
+      "epoch": 0.7786666666666666,
+      "grad_norm": 0.2572067975997925,
+      "learning_rate": 2.24e-06,
+      "loss": 0.0611,
+      "mean_token_accuracy": 0.9849242977797985,
+      "num_tokens": 9568256.0,
+      "step": 292
+    },
+    {
+      "epoch": 0.7893333333333333,
+      "grad_norm": 0.29180145263671875,
+      "learning_rate": 2.133333333333334e-06,
+      "loss": 0.0549,
+      "mean_token_accuracy": 0.9862227980047464,
+      "num_tokens": 9699328.0,
+      "step": 296
+    },
+    {
+      "epoch": 0.8,
+      "grad_norm": 0.23858007788658142,
+      "learning_rate": 2.0266666666666666e-06,
+      "loss": 0.0543,
+      "mean_token_accuracy": 0.986565887928009,
+      "num_tokens": 9830400.0,
+      "step": 300
+    },
+    {
+      "epoch": 0.8106666666666666,
+      "grad_norm": 0.2548295259475708,
+      "learning_rate": 1.9200000000000003e-06,
+      "loss": 0.0605,
+      "mean_token_accuracy": 0.9851347785443068,
+      "num_tokens": 9961472.0,
+      "step": 304
+    },
+    {
+      "epoch": 0.8213333333333334,
+      "grad_norm": 0.2322961390018463,
+      "learning_rate": 1.8133333333333337e-06,
+      "loss": 0.0546,
+      "mean_token_accuracy": 0.9861893225461245,
+      "num_tokens": 10092544.0,
+      "step": 308
+    },
+    {
+      "epoch": 0.832,
+      "grad_norm": 0.26138052344322205,
+      "learning_rate": 1.7066666666666667e-06,
+      "loss": 0.0553,
+      "mean_token_accuracy": 0.9866313245147467,
+      "num_tokens": 10223616.0,
+      "step": 312
+    },
+    {
+      "epoch": 0.8426666666666667,
+      "grad_norm": 0.27364590764045715,
+      "learning_rate": 1.6000000000000001e-06,
+      "loss": 0.0617,
+      "mean_token_accuracy": 0.9845649115741253,
+      "num_tokens": 10354688.0,
+      "step": 316
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "grad_norm": 0.23869001865386963,
+      "learning_rate": 1.4933333333333336e-06,
+      "loss": 0.0549,
+      "mean_token_accuracy": 0.9865992218255997,
+      "num_tokens": 10485760.0,
+      "step": 320
+    },
+    {
+      "epoch": 0.864,
+      "grad_norm": 0.2767007052898407,
+      "learning_rate": 1.3866666666666668e-06,
+      "loss": 0.0592,
+      "mean_token_accuracy": 0.9854040741920471,
+      "num_tokens": 10616832.0,
+      "step": 324
+    },
+    {
+      "epoch": 0.8746666666666667,
+      "grad_norm": 0.21002186834812164,
+      "learning_rate": 1.28e-06,
+      "loss": 0.0571,
+      "mean_token_accuracy": 0.9858873914927244,
+      "num_tokens": 10747904.0,
+      "step": 328
+    },
+    {
+      "epoch": 0.8853333333333333,
+      "grad_norm": 0.21181665360927582,
+      "learning_rate": 1.1733333333333335e-06,
+      "loss": 0.0521,
+      "mean_token_accuracy": 0.9871898032724857,
+      "num_tokens": 10878976.0,
+      "step": 332
+    },
+    {
+      "epoch": 0.896,
+      "grad_norm": 0.31921684741973877,
+      "learning_rate": 1.066666666666667e-06,
+      "loss": 0.0563,
+      "mean_token_accuracy": 0.9860129542648792,
+      "num_tokens": 11010048.0,
+      "step": 336
+    },
+    {
+      "epoch": 0.9066666666666666,
+      "grad_norm": 0.2304464429616928,
+      "learning_rate": 9.600000000000001e-07,
+      "loss": 0.0547,
+      "mean_token_accuracy": 0.9868404492735863,
+      "num_tokens": 11141120.0,
+      "step": 340
+    },
+    {
+      "epoch": 0.9173333333333333,
+      "grad_norm": 0.2787565290927887,
+      "learning_rate": 8.533333333333334e-07,
+      "loss": 0.0527,
+      "mean_token_accuracy": 0.9869417436420918,
+      "num_tokens": 11272192.0,
+      "step": 344
+    },
+    {
+      "epoch": 0.928,
+      "grad_norm": 0.2605002224445343,
+      "learning_rate": 7.466666666666668e-07,
+      "loss": 0.0584,
+      "mean_token_accuracy": 0.985166372731328,
+      "num_tokens": 11403264.0,
+      "step": 348
+    },
+    {
+      "epoch": 0.9386666666666666,
+      "grad_norm": 0.30631157755851746,
+      "learning_rate": 6.4e-07,
+      "loss": 0.0559,
+      "mean_token_accuracy": 0.9864703621715307,
+      "num_tokens": 11534336.0,
+      "step": 352
+    },
+    {
+      "epoch": 0.9493333333333334,
+      "grad_norm": 0.22370117902755737,
+      "learning_rate": 5.333333333333335e-07,
+      "loss": 0.0607,
+      "mean_token_accuracy": 0.9853631183505058,
+      "num_tokens": 11665408.0,
+      "step": 356
+    },
+    {
+      "epoch": 0.96,
+      "grad_norm": 0.2398754060268402,
+      "learning_rate": 4.266666666666667e-07,
+      "loss": 0.0553,
+      "mean_token_accuracy": 0.9864100068807602,
+      "num_tokens": 11796480.0,
+      "step": 360
+    },
+    {
+      "epoch": 0.9706666666666667,
+      "grad_norm": 0.259531170129776,
+      "learning_rate": 3.2e-07,
+      "loss": 0.055,
+      "mean_token_accuracy": 0.9865185879170895,
+      "num_tokens": 11927552.0,
+      "step": 364
+    },
+    {
+      "epoch": 0.9813333333333333,
+      "grad_norm": 0.2931111752986908,
+      "learning_rate": 2.1333333333333334e-07,
+      "loss": 0.0596,
+      "mean_token_accuracy": 0.9853004887700081,
+      "num_tokens": 12058624.0,
+      "step": 368
+    },
+    {
+      "epoch": 0.992,
+      "grad_norm": 0.22077959775924683,
+      "learning_rate": 1.0666666666666667e-07,
+      "loss": 0.0572,
+      "mean_token_accuracy": 0.985428512096405,
+      "num_tokens": 12189696.0,
+      "step": 372
+    }
+  ],
+  "logging_steps": 4,
+  "max_steps": 375,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 5000,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 5.30671428698112e+17,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3a5059008c70d1c185a1ed4ef672b5fd841e769c99f2384bbc210e01adb553a9
+size 6225