Improve language tag

#1
by lbourdois - opened
Files changed (1) hide show
  1. README.md +180 -166
README.md CHANGED
@@ -1,166 +1,180 @@
1
- ---
2
- library_name: transformers
3
- license: apache-2.0
4
- base_model: Qwen/Qwen2.5-7B-Instruct
5
- tags:
6
- - axolotl
7
- - generated_from_trainer
8
- datasets:
9
- - mb_base.jsonl
10
- model-index:
11
- - name: merged-bench-train-base
12
- results: []
13
- ---
14
-
15
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
- should probably proofread and complete it, then remove this comment. -->
17
-
18
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
19
- <details><summary>See axolotl config</summary>
20
-
21
- axolotl version: `0.8.0`
22
- ```yaml
23
- base_model: Qwen/Qwen2.5-7B-Instruct
24
- model_type: AutoModelForCausalLM
25
- tokenizer_type: AutoTokenizer
26
- trust_remote_code: false
27
-
28
- load_in_8bit: false
29
- load_in_4bit: false
30
- strict: false
31
-
32
- output_dir: ./outputs/out
33
- chat_template: qwen_25
34
- datasets:
35
- - path: mb_base.jsonl
36
- type: chat_template
37
- field_messages: messages
38
- message_field_role: role
39
- message_field_content: content
40
- roles:
41
- system:
42
- - system
43
- user:
44
- - user
45
- assistant:
46
- - assistant
47
-
48
- dataset_prepared_path: last_run_prepared
49
- val_set_size: 0.005
50
- output_dir: ./outputs/out
51
- eval_sample_packing: False
52
-
53
- sequence_len: 8192
54
- sample_packing: False
55
- pad_to_sequence_len: False
56
-
57
- wandb_project: mergedbench
58
- wandb_entity:
59
- wandb_watch:
60
- wandb_name:
61
- wandb_log_model:
62
- hub_model_id: amphora/merged-bench-train-base
63
-
64
- plugins:
65
- - axolotl.integrations.liger.LigerPlugin
66
- liger_rope: true
67
- liger_rms_norm: true
68
- liger_swiglu: true
69
- liger_fused_linear_cross_entropy: true
70
-
71
- gradient_accumulation_steps: 2
72
- micro_batch_size: 8
73
- eval_batch_size: 4
74
- num_epochs: 3
75
- optimizer: paged_adamw_8bit
76
- lr_scheduler: cosine
77
- learning_rate: 2e-5
78
-
79
- train_on_inputs: false
80
- group_by_length: false
81
- bf16: auto
82
- fp16:
83
- tf32: false
84
-
85
- gradient_checkpointing: true
86
- gradient_checkpointing_kwargs:
87
- use_reentrant: false
88
- early_stopping_patience:
89
- resume_from_checkpoint:
90
- logging_steps: 1
91
- xformers_attention:
92
- flash_attention: true
93
-
94
- warmup_steps: 30
95
- evals_per_epoch: 3
96
- eval_max_new_tokens: 128
97
- eval_table_size:
98
- saves_per_epoch: 1
99
- debug:
100
- deepspeed: deepspeed_configs/zero3_bf16.json
101
- weight_decay: 0.01
102
- fsdp:
103
- fsdp_config:
104
- special_tokens:
105
- ```
106
-
107
- </details><br>
108
-
109
- # merged-bench-train-base
110
-
111
- This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) on the mb_base.jsonl dataset.
112
- It achieves the following results on the evaluation set:
113
- - Loss: 0.3173
114
-
115
- ## Model description
116
-
117
- More information needed
118
-
119
- ## Intended uses & limitations
120
-
121
- More information needed
122
-
123
- ## Training and evaluation data
124
-
125
- More information needed
126
-
127
- ## Training procedure
128
-
129
- ### Training hyperparameters
130
-
131
- The following hyperparameters were used during training:
132
- - learning_rate: 2e-05
133
- - train_batch_size: 8
134
- - eval_batch_size: 4
135
- - seed: 42
136
- - distributed_type: multi-GPU
137
- - num_devices: 4
138
- - gradient_accumulation_steps: 2
139
- - total_train_batch_size: 64
140
- - total_eval_batch_size: 16
141
- - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
142
- - lr_scheduler_type: cosine
143
- - lr_scheduler_warmup_steps: 30
144
- - num_epochs: 3.0
145
-
146
- ### Training results
147
-
148
- | Training Loss | Epoch | Step | Validation Loss |
149
- |:-------------:|:------:|:----:|:---------------:|
150
- | 1.0382 | 0.0059 | 1 | 1.0319 |
151
- | 0.3455 | 0.3373 | 57 | 0.3270 |
152
- | 0.3169 | 0.6746 | 114 | 0.3173 |
153
- | 0.2116 | 1.0118 | 171 | 0.3009 |
154
- | 0.2064 | 1.3491 | 228 | 0.3020 |
155
- | 0.1871 | 1.6864 | 285 | 0.2955 |
156
- | 0.1069 | 2.0237 | 342 | 0.2880 |
157
- | 0.1014 | 2.3609 | 399 | 0.3192 |
158
- | 0.0955 | 2.6982 | 456 | 0.3173 |
159
-
160
-
161
- ### Framework versions
162
-
163
- - Transformers 4.51.0
164
- - Pytorch 2.6.0+cu124
165
- - Datasets 3.5.0
166
- - Tokenizers 0.21.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: Qwen/Qwen2.5-7B-Instruct
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ datasets:
9
+ - mb_base.jsonl
10
+ language:
11
+ - zho
12
+ - eng
13
+ - fra
14
+ - spa
15
+ - por
16
+ - deu
17
+ - ita
18
+ - rus
19
+ - jpn
20
+ - kor
21
+ - vie
22
+ - tha
23
+ - ara
24
+ model-index:
25
+ - name: merged-bench-train-base
26
+ results: []
27
+ ---
28
+
29
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
30
+ should probably proofread and complete it, then remove this comment. -->
31
+
32
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
33
+ <details><summary>See axolotl config</summary>
34
+
35
+ axolotl version: `0.8.0`
36
+ ```yaml
37
+ base_model: Qwen/Qwen2.5-7B-Instruct
38
+ model_type: AutoModelForCausalLM
39
+ tokenizer_type: AutoTokenizer
40
+ trust_remote_code: false
41
+
42
+ load_in_8bit: false
43
+ load_in_4bit: false
44
+ strict: false
45
+
46
+ output_dir: ./outputs/out
47
+ chat_template: qwen_25
48
+ datasets:
49
+ - path: mb_base.jsonl
50
+ type: chat_template
51
+ field_messages: messages
52
+ message_field_role: role
53
+ message_field_content: content
54
+ roles:
55
+ system:
56
+ - system
57
+ user:
58
+ - user
59
+ assistant:
60
+ - assistant
61
+
62
+ dataset_prepared_path: last_run_prepared
63
+ val_set_size: 0.005
64
+ output_dir: ./outputs/out
65
+ eval_sample_packing: False
66
+
67
+ sequence_len: 8192
68
+ sample_packing: False
69
+ pad_to_sequence_len: False
70
+
71
+ wandb_project: mergedbench
72
+ wandb_entity:
73
+ wandb_watch:
74
+ wandb_name:
75
+ wandb_log_model:
76
+ hub_model_id: amphora/merged-bench-train-base
77
+
78
+ plugins:
79
+ - axolotl.integrations.liger.LigerPlugin
80
+ liger_rope: true
81
+ liger_rms_norm: true
82
+ liger_swiglu: true
83
+ liger_fused_linear_cross_entropy: true
84
+
85
+ gradient_accumulation_steps: 2
86
+ micro_batch_size: 8
87
+ eval_batch_size: 4
88
+ num_epochs: 3
89
+ optimizer: paged_adamw_8bit
90
+ lr_scheduler: cosine
91
+ learning_rate: 2e-5
92
+
93
+ train_on_inputs: false
94
+ group_by_length: false
95
+ bf16: auto
96
+ fp16:
97
+ tf32: false
98
+
99
+ gradient_checkpointing: true
100
+ gradient_checkpointing_kwargs:
101
+ use_reentrant: false
102
+ early_stopping_patience:
103
+ resume_from_checkpoint:
104
+ logging_steps: 1
105
+ xformers_attention:
106
+ flash_attention: true
107
+
108
+ warmup_steps: 30
109
+ evals_per_epoch: 3
110
+ eval_max_new_tokens: 128
111
+ eval_table_size:
112
+ saves_per_epoch: 1
113
+ debug:
114
+ deepspeed: deepspeed_configs/zero3_bf16.json
115
+ weight_decay: 0.01
116
+ fsdp:
117
+ fsdp_config:
118
+ special_tokens:
119
+ ```
120
+
121
+ </details><br>
122
+
123
+ # merged-bench-train-base
124
+
125
+ This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) on the mb_base.jsonl dataset.
126
+ It achieves the following results on the evaluation set:
127
+ - Loss: 0.3173
128
+
129
+ ## Model description
130
+
131
+ More information needed
132
+
133
+ ## Intended uses & limitations
134
+
135
+ More information needed
136
+
137
+ ## Training and evaluation data
138
+
139
+ More information needed
140
+
141
+ ## Training procedure
142
+
143
+ ### Training hyperparameters
144
+
145
+ The following hyperparameters were used during training:
146
+ - learning_rate: 2e-05
147
+ - train_batch_size: 8
148
+ - eval_batch_size: 4
149
+ - seed: 42
150
+ - distributed_type: multi-GPU
151
+ - num_devices: 4
152
+ - gradient_accumulation_steps: 2
153
+ - total_train_batch_size: 64
154
+ - total_eval_batch_size: 16
155
+ - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
156
+ - lr_scheduler_type: cosine
157
+ - lr_scheduler_warmup_steps: 30
158
+ - num_epochs: 3.0
159
+
160
+ ### Training results
161
+
162
+ | Training Loss | Epoch | Step | Validation Loss |
163
+ |:-------------:|:------:|:----:|:---------------:|
164
+ | 1.0382 | 0.0059 | 1 | 1.0319 |
165
+ | 0.3455 | 0.3373 | 57 | 0.3270 |
166
+ | 0.3169 | 0.6746 | 114 | 0.3173 |
167
+ | 0.2116 | 1.0118 | 171 | 0.3009 |
168
+ | 0.2064 | 1.3491 | 228 | 0.3020 |
169
+ | 0.1871 | 1.6864 | 285 | 0.2955 |
170
+ | 0.1069 | 2.0237 | 342 | 0.2880 |
171
+ | 0.1014 | 2.3609 | 399 | 0.3192 |
172
+ | 0.0955 | 2.6982 | 456 | 0.3173 |
173
+
174
+
175
+ ### Framework versions
176
+
177
+ - Transformers 4.51.0
178
+ - Pytorch 2.6.0+cu124
179
+ - Datasets 3.5.0
180
+ - Tokenizers 0.21.1