Improve language tag

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show

README.md +196 -182

README.md CHANGED Viewed

@@ -1,183 +1,197 @@
----
-library_name: transformers
-license: apache-2.0
-base_model: Qwen/Qwen2.5-14B
-model-index:
-- name: LLaMutation-Qwen2.5-14B-SFFT-v0.0
-  results: []
----
-# LLaMutation-Qwen2.5-14B-SFFT-v0.0
-![image/webp](https://cdn-uploads.huggingface.co/production/uploads/655dc641accde1bbc8b41aec/IFK02cTih72zfZfT5UY4f.webp)
-This model is a [Spectrum](https://github.com/axolotl-ai-cloud/axolotl/blob/67f744dc8c9564ef7a42d5df780ae53e319dca61/src/axolotl/integrations/spectrum/README.md) FFT of [Qwen/Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) on a code translation dataset evolved with [EvolKit](https://github.com/arcee-ai/EvolKit).
-## Model description
-Code translation and completion model trained on Qwen2.5-14B as there is not yet a Qwen2.5-Coder-14B model. This is 100% an alpha completion model thus there will be quirks to it's useage parameters.
-I will refine the model both for completion and create an instruct/chat variant.
-## Intended uses & limitations
-Differing system prompts for code translation and use as a tab autocomplete model with [continue.dev](https://www.continue.dev/)
-## Chat template and sampling paramaters.
-Chat template is chatml.
-Sampling parameters for the generation and demo at the hackathon are here:
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/655dc641accde1bbc8b41aec/YzQ8nqu83lEhl3Kg4u0PC.png)
-### SYSTEM PROMPT MUST BE USED FOR THIS MODEL
-`You are an Al assistant that is an expert at converting code from any language to another within properly formatted code blocks. DON'T SAY ANYTHING ABOUT NOT SEEING CODE. Keep non code text to the a minimum possible. DO NOT REPEAT ANY NON CODE TEXT. ONLY PRINT OUT CODE ONCE DO NOT ITTERATE!`
-## Training procedure
-Spectrum FFT/SFFT
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 0.0005
-- train_batch_size: 1
-- eval_batch_size: 1
-- seed: 42
-- distributed_type: multi-GPU
-- num_devices: 8
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 32
-- total_eval_batch_size: 8
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 50
-- num_epochs: 1
-### Training results
-| Training Loss | Epoch  | Step | Validation Loss |
-|:-------------:|:------:|:----:|:---------------:|
-| 0.3948        | 0.0237 | 1    | 0.3920          |
-| 0.2392        | 0.4970 | 21   | 0.2500          |
-| 0.2606        | 0.9941 | 42   | 0.2621          |
-### Framework versions
-- Transformers 4.45.2
-- Pytorch 2.3.1+cu121
-- Datasets 3.0.1
-- Tokenizers 0.20.1
-[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
-<details><summary>See axolotl config</summary>
-axolotl version: `0.4.1`
-```yaml
-base_model: Qwen/Qwen2.5-14B
-load_in_8bit: false
-load_in_4bit: false
-strict: false
-plugins:
-  - axolotl.integrations.liger.LigerPlugin
-liger_rope: true
-liger_rms_norm: true
-liger_swiglu: true
-liger_fused_linear_cross_entropy: true
-plugins:
-  - axolotl.integrations.spectrum.SpectrumPlugin
-spectrum_top_fraction: 0.5
-# Optional if using a pre-scanned model as your base_model. Useful if using a model mirror
-spectrum_model_name: Qwen/Qwen2.5-14B
-datasets:
-  - path: datasets/LLaMutation.jsonl
-    type: sharegpt
-  - path: datasets/LLaMutationMAX_Train.json
-    type: sharegpt
-chat_template: chatml
-shuffle_merged_datasets: true
-val_set_size: 0.1
-output_dir: ./LLaMutation-Qwen2.5-14B-SFFT-v0.0
-sequence_len: 8192
-sample_packing: true
-eval_sample_packing: true
-pad_to_sequence_len: true
-# adapter: qlora
-# lora_model_dir:
-# lora_r: 32
-# lora_alpha: 16
-# lora_dropout: 0.05
-# lora_target_linear: true
-# peft_use_dora: true
-wandb_project: LLaMutation-Qwen2.5-14B-SFFT-v0.0
-wandb_entity:
-wandb_watch:
-wandb_name: Unit-00
-wandb_log_model:
-gradient_accumulation_steps: 4
-micro_batch_size: 1
-num_epochs: 1
-optimizer: adamw_torch
-lr_scheduler: linear
-learning_rate: 0.0005
-max_grad_norm: 3
-train_on_inputs: false
-group_by_length: false
-bf16: auto
-fp16:
-tf32: true
-gradient_checkpointing: true
-gradient_checkpointing_kwargs:
-  use_reentrant: true
-early_stopping_patience:
-resume_from_checkpoint:
-local_rank:
-logging_steps: 1
-xformers_attention:
-flash_attention: true
-warmup_steps: 50
-evals_per_epoch: 2
-saves_per_epoch: 2
-save_safetensors: true
-hub_model_id:
-hub_strategy:
-debug:
-deepspeed: deepspeed_configs/zero3_bf16.json
-weight_decay: 0.1
-# fsdp:
-#   - full_shard
-#   - auto_wrap
-# fsdp_config:
-#   fsdp_limit_all_gathers: true
-#   fsdp_sync_module_states: true
-#   fsdp_offload_params: false  # Changed from true
-#   fsdp_use_orig_params: true  # Changed from false
-#   fsdp_cpu_ram_efficient_loading: true
-#   fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
-#   fsdp_transformer_layer_cls_to_wrap: Qwen2DecoderLayer
-#   fsdp_activation_checkpointing: true
-#   fsdp_state_dict_type: SHARDED_STATE_DICT  # Changed from FULL_STATE_DICT
-#   fsdp_sharding_strategy: FULL_SHARD
-#   fsdp_forward_prefetch: true  # Added
-#   fsdp_backward_prefetch: "BACKWARD_POST"  # Added
-#   fsdp_backward_prefetch_limit: 1  # Added
-#   fsdp_mixed_precision: BF16  # Added
-```
 </details><br>

+---
+library_name: transformers
+license: apache-2.0
+base_model: Qwen/Qwen2.5-14B
+language:
+- zho
+- eng
+- fra
+- spa
+- por
+- deu
+- ita
+- rus
+- jpn
+- kor
+- vie
+- tha
+- ara
+model-index:
+- name: LLaMutation-Qwen2.5-14B-SFFT-v0.0
+  results: []
+---
+# LLaMutation-Qwen2.5-14B-SFFT-v0.0
+![image/webp](https://cdn-uploads.huggingface.co/production/uploads/655dc641accde1bbc8b41aec/IFK02cTih72zfZfT5UY4f.webp)
+This model is a [Spectrum](https://github.com/axolotl-ai-cloud/axolotl/blob/67f744dc8c9564ef7a42d5df780ae53e319dca61/src/axolotl/integrations/spectrum/README.md) FFT of [Qwen/Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) on a code translation dataset evolved with [EvolKit](https://github.com/arcee-ai/EvolKit).
+## Model description
+Code translation and completion model trained on Qwen2.5-14B as there is not yet a Qwen2.5-Coder-14B model. This is 100% an alpha completion model thus there will be quirks to it's useage parameters.
+I will refine the model both for completion and create an instruct/chat variant.
+## Intended uses & limitations
+Differing system prompts for code translation and use as a tab autocomplete model with [continue.dev](https://www.continue.dev/)
+## Chat template and sampling paramaters.
+Chat template is chatml.
+Sampling parameters for the generation and demo at the hackathon are here:
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/655dc641accde1bbc8b41aec/YzQ8nqu83lEhl3Kg4u0PC.png)
+### SYSTEM PROMPT MUST BE USED FOR THIS MODEL
+`You are an Al assistant that is an expert at converting code from any language to another within properly formatted code blocks. DON'T SAY ANYTHING ABOUT NOT SEEING CODE. Keep non code text to the a minimum possible. DO NOT REPEAT ANY NON CODE TEXT. ONLY PRINT OUT CODE ONCE DO NOT ITTERATE!`
+## Training procedure
+Spectrum FFT/SFFT
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0005
+- train_batch_size: 1
+- eval_batch_size: 1
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 8
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 32
+- total_eval_batch_size: 8
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 50
+- num_epochs: 1
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 0.3948        | 0.0237 | 1    | 0.3920          |
+| 0.2392        | 0.4970 | 21   | 0.2500          |
+| 0.2606        | 0.9941 | 42   | 0.2621          |
+### Framework versions
+- Transformers 4.45.2
+- Pytorch 2.3.1+cu121
+- Datasets 3.0.1
+- Tokenizers 0.20.1
+[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
+<details><summary>See axolotl config</summary>
+axolotl version: `0.4.1`
+```yaml
+base_model: Qwen/Qwen2.5-14B
+load_in_8bit: false
+load_in_4bit: false
+strict: false
+plugins:
+  - axolotl.integrations.liger.LigerPlugin
+liger_rope: true
+liger_rms_norm: true
+liger_swiglu: true
+liger_fused_linear_cross_entropy: true
+plugins:
+  - axolotl.integrations.spectrum.SpectrumPlugin
+spectrum_top_fraction: 0.5
+# Optional if using a pre-scanned model as your base_model. Useful if using a model mirror
+spectrum_model_name: Qwen/Qwen2.5-14B
+datasets:
+  - path: datasets/LLaMutation.jsonl
+    type: sharegpt
+  - path: datasets/LLaMutationMAX_Train.json
+    type: sharegpt
+chat_template: chatml
+shuffle_merged_datasets: true
+val_set_size: 0.1
+output_dir: ./LLaMutation-Qwen2.5-14B-SFFT-v0.0
+sequence_len: 8192
+sample_packing: true
+eval_sample_packing: true
+pad_to_sequence_len: true
+# adapter: qlora
+# lora_model_dir:
+# lora_r: 32
+# lora_alpha: 16
+# lora_dropout: 0.05
+# lora_target_linear: true
+# peft_use_dora: true
+wandb_project: LLaMutation-Qwen2.5-14B-SFFT-v0.0
+wandb_entity:
+wandb_watch:
+wandb_name: Unit-00
+wandb_log_model:
+gradient_accumulation_steps: 4
+micro_batch_size: 1
+num_epochs: 1
+optimizer: adamw_torch
+lr_scheduler: linear
+learning_rate: 0.0005
+max_grad_norm: 3
+train_on_inputs: false
+group_by_length: false
+bf16: auto
+fp16:
+tf32: true
+gradient_checkpointing: true
+gradient_checkpointing_kwargs:
+  use_reentrant: true
+early_stopping_patience:
+resume_from_checkpoint:
+local_rank:
+logging_steps: 1
+xformers_attention:
+flash_attention: true
+warmup_steps: 50
+evals_per_epoch: 2
+saves_per_epoch: 2
+save_safetensors: true
+hub_model_id:
+hub_strategy:
+debug:
+deepspeed: deepspeed_configs/zero3_bf16.json
+weight_decay: 0.1
+# fsdp:
+#   - full_shard
+#   - auto_wrap
+# fsdp_config:
+#   fsdp_limit_all_gathers: true
+#   fsdp_sync_module_states: true
+#   fsdp_offload_params: false  # Changed from true
+#   fsdp_use_orig_params: true  # Changed from false
+#   fsdp_cpu_ram_efficient_loading: true
+#   fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
+#   fsdp_transformer_layer_cls_to_wrap: Qwen2DecoderLayer
+#   fsdp_activation_checkpointing: true
+#   fsdp_state_dict_type: SHARDED_STATE_DICT  # Changed from FULL_STATE_DICT
+#   fsdp_sharding_strategy: FULL_SHARD
+#   fsdp_forward_prefetch: true  # Added
+#   fsdp_backward_prefetch: "BACKWARD_POST"  # Added
+#   fsdp_backward_prefetch_limit: 1  # Added
+#   fsdp_mixed_precision: BF16  # Added
+```
 </details><br>