bunyaminergen
/

Qwen2.5-Coder-1.5B-Instruct-SFT

@@ -34,6 +34,7 @@ please read the [CONTRIBUTING](CONTRIBUTING.md) first._
 - [Usage](#usage)
 - [Comparison](#comparison)
 - [Dataset](#dataset)
 - [Documentations](#documentations)
 - [License](#licence)
 - [Links](#links)
@@ -169,6 +170,81 @@ This implementation includes error handling and examples for usage.
 ---
 ### Documentations
 - [CONTRIBUTING](CONTRIBUTING.md)

 - [Usage](#usage)
 - [Comparison](#comparison)
 - [Dataset](#dataset)
+- [Training](#training)
 - [Documentations](#documentations)
 - [License](#licence)
 - [Links](#links)
 ---
+### Training
+#### Hyperparameters
+| Hyperparameter              | Value                                 |
+|-----------------------------|---------------------------------------|
+| Base Model                  | `Qwen/Qwen2.5-Coder-1.5B-Instruct`    |
+| Fine-tuning Method          | QLoRA (Quantized Low-Rank Adaptation) |
+| Task Type                   | `CAUSAL_LM`                           |
+| Number of Epochs            | `11`                                  |
+| Batch Size                  | `8`                                   |
+| Gradient Accumulation Steps | `2`                                   |
+| Effective Batch Size        | `16` (8 × 2)                          |
+| Learning Rate               | `1e-4`                                |
+| Optimizer                   | `AdamW`                               |
+| Precision                   | `BF16 Mixed Precision`                |
+| Evaluation Strategy         | `None`                                |
+| Max Sequence Length         | `1024 tokens`                         |
+| Logging Steps               | every `1000` steps                    |
+| Save Checkpoint Steps       | every `7200` steps                    |
+| Output Directory            | Overwritten per run                   |
+| Experiment Tracking         | `MLflow` (local tracking)             |
+| Experiment Name             | `AssistantFineTuning`                 |
+| MLflow Run Name             | `AssistantFT`                         |
+#### PEFT (QLoRA) Configuration
+| Parameter       | Value                    |
+|-----------------|--------------------------|
+| LoRA Rank (`r`) | `16`                     |
+| LoRA Alpha      | `32`                     |
+| LoRA Dropout    | `0.05`                   |
+| Target Modules  | `all-linear`             |
+| Modules Saved   | `lm_head`, `embed_token` |
+#### Dataset
+- **Train/Test Split:** `90%/10%`
+- **Random Seed:** `19`
+- **Train Batched:** `True`
+- **Eval Batched:** `True`
+#### Tokenizer Configuration
+- **Truncation:** Enabled (`max_length=1024`)
+- **Masked Language Modeling (MLM):** `False`
+#### Speeds, Sizes, Times
+- **Total Training Time:** ~11 hours
+- **Checkpoint Frequency:** every `7200` steps
+- **Checkpoint Steps:**
+    - `checkpoint-7200`
+    - `checkpoint-14400`
+    - `checkpoint-21600`
+    - `checkpoint-28800`
+    - `checkpoint-36000`
+    - `checkpoint-39600` *(final checkpoint)*
+#### Compute Infrastructure
+**Hardware:**
+- GPU: **1 × NVIDIA L40S (48 GB VRAM)**
+- RAM: **62 GB**
+- CPU: **16 vCPU**
+**Software:**
+- OS: **Ubuntu 22.04**
+- Framework: **PyTorch 2.4.0**
+- CUDA Version: **12.4.1**
+---
 ### Documentations
 - [CONTRIBUTING](CONTRIBUTING.md)