lbourdois commited on
Commit
9e14394
·
verified ·
1 Parent(s): a29cf36

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +196 -182
README.md CHANGED
@@ -1,183 +1,197 @@
1
- ---
2
- library_name: transformers
3
- license: apache-2.0
4
- base_model: Qwen/Qwen2.5-14B
5
- model-index:
6
- - name: LLaMutation-Qwen2.5-14B-SFFT-v0.0
7
- results: []
8
- ---
9
-
10
- # LLaMutation-Qwen2.5-14B-SFFT-v0.0
11
-
12
- ![image/webp](https://cdn-uploads.huggingface.co/production/uploads/655dc641accde1bbc8b41aec/IFK02cTih72zfZfT5UY4f.webp)
13
-
14
- This model is a [Spectrum](https://github.com/axolotl-ai-cloud/axolotl/blob/67f744dc8c9564ef7a42d5df780ae53e319dca61/src/axolotl/integrations/spectrum/README.md) FFT of [Qwen/Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) on a code translation dataset evolved with [EvolKit](https://github.com/arcee-ai/EvolKit).
15
-
16
- ## Model description
17
-
18
- Code translation and completion model trained on Qwen2.5-14B as there is not yet a Qwen2.5-Coder-14B model. This is 100% an alpha completion model thus there will be quirks to it's useage parameters.
19
-
20
- I will refine the model both for completion and create an instruct/chat variant.
21
-
22
- ## Intended uses & limitations
23
-
24
- Differing system prompts for code translation and use as a tab autocomplete model with [continue.dev](https://www.continue.dev/)
25
-
26
- ## Chat template and sampling paramaters.
27
-
28
- Chat template is chatml.
29
-
30
- Sampling parameters for the generation and demo at the hackathon are here:
31
-
32
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/655dc641accde1bbc8b41aec/YzQ8nqu83lEhl3Kg4u0PC.png)
33
-
34
- ### SYSTEM PROMPT MUST BE USED FOR THIS MODEL
35
-
36
- `You are an Al assistant that is an expert at converting code from any language to another within properly formatted code blocks. DON'T SAY ANYTHING ABOUT NOT SEEING CODE. Keep non code text to the a minimum possible. DO NOT REPEAT ANY NON CODE TEXT. ONLY PRINT OUT CODE ONCE DO NOT ITTERATE!`
37
-
38
- ## Training procedure
39
-
40
- Spectrum FFT/SFFT
41
-
42
- ### Training hyperparameters
43
-
44
- The following hyperparameters were used during training:
45
- - learning_rate: 0.0005
46
- - train_batch_size: 1
47
- - eval_batch_size: 1
48
- - seed: 42
49
- - distributed_type: multi-GPU
50
- - num_devices: 8
51
- - gradient_accumulation_steps: 4
52
- - total_train_batch_size: 32
53
- - total_eval_batch_size: 8
54
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
- - lr_scheduler_type: linear
56
- - lr_scheduler_warmup_steps: 50
57
- - num_epochs: 1
58
-
59
- ### Training results
60
-
61
- | Training Loss | Epoch | Step | Validation Loss |
62
- |:-------------:|:------:|:----:|:---------------:|
63
- | 0.3948 | 0.0237 | 1 | 0.3920 |
64
- | 0.2392 | 0.4970 | 21 | 0.2500 |
65
- | 0.2606 | 0.9941 | 42 | 0.2621 |
66
-
67
-
68
- ### Framework versions
69
-
70
- - Transformers 4.45.2
71
- - Pytorch 2.3.1+cu121
72
- - Datasets 3.0.1
73
- - Tokenizers 0.20.1
74
-
75
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
76
- <details><summary>See axolotl config</summary>
77
-
78
- axolotl version: `0.4.1`
79
- ```yaml
80
- base_model: Qwen/Qwen2.5-14B
81
-
82
- load_in_8bit: false
83
- load_in_4bit: false
84
- strict: false
85
-
86
- plugins:
87
- - axolotl.integrations.liger.LigerPlugin
88
- liger_rope: true
89
- liger_rms_norm: true
90
- liger_swiglu: true
91
- liger_fused_linear_cross_entropy: true
92
-
93
- plugins:
94
- - axolotl.integrations.spectrum.SpectrumPlugin
95
-
96
- spectrum_top_fraction: 0.5
97
- # Optional if using a pre-scanned model as your base_model. Useful if using a model mirror
98
- spectrum_model_name: Qwen/Qwen2.5-14B
99
-
100
- datasets:
101
- - path: datasets/LLaMutation.jsonl
102
- type: sharegpt
103
- - path: datasets/LLaMutationMAX_Train.json
104
- type: sharegpt
105
-
106
- chat_template: chatml
107
- shuffle_merged_datasets: true
108
- val_set_size: 0.1
109
- output_dir: ./LLaMutation-Qwen2.5-14B-SFFT-v0.0
110
-
111
- sequence_len: 8192
112
- sample_packing: true
113
- eval_sample_packing: true
114
- pad_to_sequence_len: true
115
-
116
- # adapter: qlora
117
- # lora_model_dir:
118
- # lora_r: 32
119
- # lora_alpha: 16
120
- # lora_dropout: 0.05
121
- # lora_target_linear: true
122
- # peft_use_dora: true
123
-
124
- wandb_project: LLaMutation-Qwen2.5-14B-SFFT-v0.0
125
- wandb_entity:
126
- wandb_watch:
127
- wandb_name: Unit-00
128
- wandb_log_model:
129
-
130
- gradient_accumulation_steps: 4
131
- micro_batch_size: 1
132
- num_epochs: 1
133
- optimizer: adamw_torch
134
- lr_scheduler: linear
135
- learning_rate: 0.0005
136
- max_grad_norm: 3
137
-
138
- train_on_inputs: false
139
- group_by_length: false
140
- bf16: auto
141
- fp16:
142
- tf32: true
143
-
144
- gradient_checkpointing: true
145
- gradient_checkpointing_kwargs:
146
- use_reentrant: true
147
- early_stopping_patience:
148
- resume_from_checkpoint:
149
- local_rank:
150
- logging_steps: 1
151
- xformers_attention:
152
- flash_attention: true
153
-
154
- warmup_steps: 50
155
- evals_per_epoch: 2
156
- saves_per_epoch: 2
157
- save_safetensors: true
158
- hub_model_id:
159
- hub_strategy:
160
- debug:
161
- deepspeed: deepspeed_configs/zero3_bf16.json
162
- weight_decay: 0.1
163
- # fsdp:
164
- # - full_shard
165
- # - auto_wrap
166
- # fsdp_config:
167
- # fsdp_limit_all_gathers: true
168
- # fsdp_sync_module_states: true
169
- # fsdp_offload_params: false # Changed from true
170
- # fsdp_use_orig_params: true # Changed from false
171
- # fsdp_cpu_ram_efficient_loading: true
172
- # fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
173
- # fsdp_transformer_layer_cls_to_wrap: Qwen2DecoderLayer
174
- # fsdp_activation_checkpointing: true
175
- # fsdp_state_dict_type: SHARDED_STATE_DICT # Changed from FULL_STATE_DICT
176
- # fsdp_sharding_strategy: FULL_SHARD
177
- # fsdp_forward_prefetch: true # Added
178
- # fsdp_backward_prefetch: "BACKWARD_POST" # Added
179
- # fsdp_backward_prefetch_limit: 1 # Added
180
- # fsdp_mixed_precision: BF16 # Added
181
- ```
182
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
183
  </details><br>
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: Qwen/Qwen2.5-14B
5
+ language:
6
+ - zho
7
+ - eng
8
+ - fra
9
+ - spa
10
+ - por
11
+ - deu
12
+ - ita
13
+ - rus
14
+ - jpn
15
+ - kor
16
+ - vie
17
+ - tha
18
+ - ara
19
+ model-index:
20
+ - name: LLaMutation-Qwen2.5-14B-SFFT-v0.0
21
+ results: []
22
+ ---
23
+
24
+ # LLaMutation-Qwen2.5-14B-SFFT-v0.0
25
+
26
+ ![image/webp](https://cdn-uploads.huggingface.co/production/uploads/655dc641accde1bbc8b41aec/IFK02cTih72zfZfT5UY4f.webp)
27
+
28
+ This model is a [Spectrum](https://github.com/axolotl-ai-cloud/axolotl/blob/67f744dc8c9564ef7a42d5df780ae53e319dca61/src/axolotl/integrations/spectrum/README.md) FFT of [Qwen/Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) on a code translation dataset evolved with [EvolKit](https://github.com/arcee-ai/EvolKit).
29
+
30
+ ## Model description
31
+
32
+ Code translation and completion model trained on Qwen2.5-14B as there is not yet a Qwen2.5-Coder-14B model. This is 100% an alpha completion model thus there will be quirks to it's useage parameters.
33
+
34
+ I will refine the model both for completion and create an instruct/chat variant.
35
+
36
+ ## Intended uses & limitations
37
+
38
+ Differing system prompts for code translation and use as a tab autocomplete model with [continue.dev](https://www.continue.dev/)
39
+
40
+ ## Chat template and sampling paramaters.
41
+
42
+ Chat template is chatml.
43
+
44
+ Sampling parameters for the generation and demo at the hackathon are here:
45
+
46
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/655dc641accde1bbc8b41aec/YzQ8nqu83lEhl3Kg4u0PC.png)
47
+
48
+ ### SYSTEM PROMPT MUST BE USED FOR THIS MODEL
49
+
50
+ `You are an Al assistant that is an expert at converting code from any language to another within properly formatted code blocks. DON'T SAY ANYTHING ABOUT NOT SEEING CODE. Keep non code text to the a minimum possible. DO NOT REPEAT ANY NON CODE TEXT. ONLY PRINT OUT CODE ONCE DO NOT ITTERATE!`
51
+
52
+ ## Training procedure
53
+
54
+ Spectrum FFT/SFFT
55
+
56
+ ### Training hyperparameters
57
+
58
+ The following hyperparameters were used during training:
59
+ - learning_rate: 0.0005
60
+ - train_batch_size: 1
61
+ - eval_batch_size: 1
62
+ - seed: 42
63
+ - distributed_type: multi-GPU
64
+ - num_devices: 8
65
+ - gradient_accumulation_steps: 4
66
+ - total_train_batch_size: 32
67
+ - total_eval_batch_size: 8
68
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
69
+ - lr_scheduler_type: linear
70
+ - lr_scheduler_warmup_steps: 50
71
+ - num_epochs: 1
72
+
73
+ ### Training results
74
+
75
+ | Training Loss | Epoch | Step | Validation Loss |
76
+ |:-------------:|:------:|:----:|:---------------:|
77
+ | 0.3948 | 0.0237 | 1 | 0.3920 |
78
+ | 0.2392 | 0.4970 | 21 | 0.2500 |
79
+ | 0.2606 | 0.9941 | 42 | 0.2621 |
80
+
81
+
82
+ ### Framework versions
83
+
84
+ - Transformers 4.45.2
85
+ - Pytorch 2.3.1+cu121
86
+ - Datasets 3.0.1
87
+ - Tokenizers 0.20.1
88
+
89
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
90
+ <details><summary>See axolotl config</summary>
91
+
92
+ axolotl version: `0.4.1`
93
+ ```yaml
94
+ base_model: Qwen/Qwen2.5-14B
95
+
96
+ load_in_8bit: false
97
+ load_in_4bit: false
98
+ strict: false
99
+
100
+ plugins:
101
+ - axolotl.integrations.liger.LigerPlugin
102
+ liger_rope: true
103
+ liger_rms_norm: true
104
+ liger_swiglu: true
105
+ liger_fused_linear_cross_entropy: true
106
+
107
+ plugins:
108
+ - axolotl.integrations.spectrum.SpectrumPlugin
109
+
110
+ spectrum_top_fraction: 0.5
111
+ # Optional if using a pre-scanned model as your base_model. Useful if using a model mirror
112
+ spectrum_model_name: Qwen/Qwen2.5-14B
113
+
114
+ datasets:
115
+ - path: datasets/LLaMutation.jsonl
116
+ type: sharegpt
117
+ - path: datasets/LLaMutationMAX_Train.json
118
+ type: sharegpt
119
+
120
+ chat_template: chatml
121
+ shuffle_merged_datasets: true
122
+ val_set_size: 0.1
123
+ output_dir: ./LLaMutation-Qwen2.5-14B-SFFT-v0.0
124
+
125
+ sequence_len: 8192
126
+ sample_packing: true
127
+ eval_sample_packing: true
128
+ pad_to_sequence_len: true
129
+
130
+ # adapter: qlora
131
+ # lora_model_dir:
132
+ # lora_r: 32
133
+ # lora_alpha: 16
134
+ # lora_dropout: 0.05
135
+ # lora_target_linear: true
136
+ # peft_use_dora: true
137
+
138
+ wandb_project: LLaMutation-Qwen2.5-14B-SFFT-v0.0
139
+ wandb_entity:
140
+ wandb_watch:
141
+ wandb_name: Unit-00
142
+ wandb_log_model:
143
+
144
+ gradient_accumulation_steps: 4
145
+ micro_batch_size: 1
146
+ num_epochs: 1
147
+ optimizer: adamw_torch
148
+ lr_scheduler: linear
149
+ learning_rate: 0.0005
150
+ max_grad_norm: 3
151
+
152
+ train_on_inputs: false
153
+ group_by_length: false
154
+ bf16: auto
155
+ fp16:
156
+ tf32: true
157
+
158
+ gradient_checkpointing: true
159
+ gradient_checkpointing_kwargs:
160
+ use_reentrant: true
161
+ early_stopping_patience:
162
+ resume_from_checkpoint:
163
+ local_rank:
164
+ logging_steps: 1
165
+ xformers_attention:
166
+ flash_attention: true
167
+
168
+ warmup_steps: 50
169
+ evals_per_epoch: 2
170
+ saves_per_epoch: 2
171
+ save_safetensors: true
172
+ hub_model_id:
173
+ hub_strategy:
174
+ debug:
175
+ deepspeed: deepspeed_configs/zero3_bf16.json
176
+ weight_decay: 0.1
177
+ # fsdp:
178
+ # - full_shard
179
+ # - auto_wrap
180
+ # fsdp_config:
181
+ # fsdp_limit_all_gathers: true
182
+ # fsdp_sync_module_states: true
183
+ # fsdp_offload_params: false # Changed from true
184
+ # fsdp_use_orig_params: true # Changed from false
185
+ # fsdp_cpu_ram_efficient_loading: true
186
+ # fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
187
+ # fsdp_transformer_layer_cls_to_wrap: Qwen2DecoderLayer
188
+ # fsdp_activation_checkpointing: true
189
+ # fsdp_state_dict_type: SHARDED_STATE_DICT # Changed from FULL_STATE_DICT
190
+ # fsdp_sharding_strategy: FULL_SHARD
191
+ # fsdp_forward_prefetch: true # Added
192
+ # fsdp_backward_prefetch: "BACKWARD_POST" # Added
193
+ # fsdp_backward_prefetch_limit: 1 # Added
194
+ # fsdp_mixed_precision: BF16 # Added
195
+ ```
196
+
197
  </details><br>