PJMixers-Dev
/

LLaMa-3.2-Text-Cleaner-v0.1-1B

Safetensors

llama

Model card Files Files and versions

xet

Community

xzuyn commited on Jun 14

Commit

b92f5e6

verified ·

1 Parent(s): 406d0c1

Create README.md

Browse files

Files changed (1) hide show

README.md +129 -0

README.md ADDED Viewed

	@@ -0,0 +1,129 @@

+---
+datasets:
+- PJMixers-Dev/Nelathan_synthetic-sugar-quill-cleaner
+base_model:
+- meta-llama/Llama-3.2-1B
+---
+# PJMixers-Dev/LLaMa-3.2-Text-Cleaner-v0.1-1B
+Model was trained at 16,384 max length, so potentially 8K input 8K output. Model will likely *heavily* reformat text, but hopefully end up with a cleaner result.
+Probably not good for cleaning something you need to be 100% accurate to the original, like educational texts, but probably fine for cleaning creative writing datasets.
+## Prompt format
+```
+<|begin_of_text|><|unclean_text|>Put your uncleaned text here.<|unclean_text|>The model will repond with a cleaned version here.<|end_of_text|>
+```
+Example using a sample from [PJMixers/RyokoAI_Honeyfeed3600](https://huggingface.co/datasets/PJMixers/RyokoAI_Honeyfeed3600)), which the model has not been trained on:
+```
+<|begin_of_text|><|unclean_text|>MODEL STILL TRAINING<|unclean_text|>MODEL STILL TRAINING<|end_of_text|>
+```
+## Axolotl Config
+```yaml
+# Requirements before running
+#   - Get latest commit of axolotl (currently c0a0c75)
+#   - Download these to axolotl/src/axolotl/prompt_formatters
+#     - https://github.com/xzuyn/axolotl/blob/came-plus-formatters/src/axolotl/prompt_strategies/text-cleaner.py
+#   - pip install ftfy
+#   - pip install git+https://github.com/xzuyn/CAME.git@sr-grams-cautious-8bit
+# Weights and Biases logging config
+wandb_project: LLaMa-3.2-1B
+wandb_entity:
+wandb_watch:
+wandb_name: LLaMa-3.2-Text-Cleaner-v0.1-1B-FFT-run2
+wandb_log_model:
+# Model checkpointing config
+output_dir: ./Outputs/LLaMa-3.2-Text-Cleaner-v0.1-1B-FFT-run2
+save_steps: 10
+save_safetensors: true
+save_total_limit: 2
+save_only_model: true
+# Model architecture config
+base_model: meta-llama/Llama-3.2-1B
+model_type: AutoModelForCausalLM
+tokenizer_type: AutoTokenizer
+# Mixed precision training config
+bf16: true
+fp16: false
+tf32: false
+# Model loading config
+load_in_8bit: false
+load_in_4bit: false
+strict: false
+# Sequence config
+sequence_len: 16384
+min_sample_len: 256
+sample_packing: false
+eval_sample_packing: false
+pad_to_sequence_len: false
+train_on_inputs: false
+group_by_length: false
+# Dataset config
+datasets:
+  - path: PJMixers-Dev/Nelathan_synthetic-sugar-quill-cleaner
+    type: text-cleaner
+val_set_size: 128
+eval_strategy: steps
+eval_steps: 10
+dataset_prepared_path: ./00-Tokenized-Datasets/LLaMa-3.2-Text-Cleaner-v0.1-1B-seed42
+shuffle_merged_datasets: true
+dataset_exact_deduplication: true
+# Training hyperparameters
+num_epochs: 1
+gradient_accumulation_steps: 1
+micro_batch_size: 8
+eval_batch_size: 8
+warmup_steps: 10
+optimizer: came_pytorch
+optim_args:
+  enable_stochastic_rounding: true
+  enable_cautious: true
+  enable_8bit: true
+  enable_gc: true
+lr_scheduler: rex
+learning_rate: 1e-6
+cosine_min_lr_ratio: 0.05
+weight_decay: 0.01
+max_grad_norm: 0.5
+logging_steps: 1
+# Model optimization
+embeddings_skip_upcast: true
+gradient_checkpointing: offload
+sdp_attention: true
+plugins:
+  - axolotl.integrations.liger.LigerPlugin
+  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
+cut_cross_entropy: true
+liger_rope: true
+liger_rms_norm: true
+liger_layer_norm: true
+liger_glu_activation: true
+liger_cross_entropy: false
+liger_fused_linear_cross_entropy: false
+# Debug config
+debug: true
+seed: 42
+# Token config
+added_tokens_overrides:
+  128011: "<|unclean_text|>"
+  128012: "<|clean_text|>"
+special_tokens:
+  bos_token: "<|begin_of_text|>"
+  eos_token: "<|end_of_text|>"
+  pad_token: "<|finetune_right_pad_id|>"
+tokens:
+```