Commit
·
94d91fc
1
Parent(s):
756e8dd
Initial
Browse files
README.md
CHANGED
@@ -34,6 +34,7 @@ please read the [CONTRIBUTING](CONTRIBUTING.md) first._
|
|
34 |
- [Usage](#usage)
|
35 |
- [Comparison](#comparison)
|
36 |
- [Dataset](#dataset)
|
|
|
37 |
- [Documentations](#documentations)
|
38 |
- [License](#licence)
|
39 |
- [Links](#links)
|
@@ -169,6 +170,81 @@ This implementation includes error handling and examples for usage.
|
|
169 |
|
170 |
---
|
171 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
172 |
### Documentations
|
173 |
|
174 |
- [CONTRIBUTING](CONTRIBUTING.md)
|
|
|
34 |
- [Usage](#usage)
|
35 |
- [Comparison](#comparison)
|
36 |
- [Dataset](#dataset)
|
37 |
+
- [Training](#training)
|
38 |
- [Documentations](#documentations)
|
39 |
- [License](#licence)
|
40 |
- [Links](#links)
|
|
|
170 |
|
171 |
---
|
172 |
|
173 |
+
### Training
|
174 |
+
|
175 |
+
#### Hyperparameters
|
176 |
+
|
177 |
+
| Hyperparameter | Value |
|
178 |
+
|-----------------------------|---------------------------------------|
|
179 |
+
| Base Model | `Qwen/Qwen2.5-Coder-1.5B-Instruct` |
|
180 |
+
| Fine-tuning Method | QLoRA (Quantized Low-Rank Adaptation) |
|
181 |
+
| Task Type | `CAUSAL_LM` |
|
182 |
+
| Number of Epochs | `11` |
|
183 |
+
| Batch Size | `8` |
|
184 |
+
| Gradient Accumulation Steps | `2` |
|
185 |
+
| Effective Batch Size | `16` (8 × 2) |
|
186 |
+
| Learning Rate | `1e-4` |
|
187 |
+
| Optimizer | `AdamW` |
|
188 |
+
| Precision | `BF16 Mixed Precision` |
|
189 |
+
| Evaluation Strategy | `None` |
|
190 |
+
| Max Sequence Length | `1024 tokens` |
|
191 |
+
| Logging Steps | every `1000` steps |
|
192 |
+
| Save Checkpoint Steps | every `7200` steps |
|
193 |
+
| Output Directory | Overwritten per run |
|
194 |
+
| Experiment Tracking | `MLflow` (local tracking) |
|
195 |
+
| Experiment Name | `AssistantFineTuning` |
|
196 |
+
| MLflow Run Name | `AssistantFT` |
|
197 |
+
|
198 |
+
#### PEFT (QLoRA) Configuration
|
199 |
+
|
200 |
+
| Parameter | Value |
|
201 |
+
|-----------------|--------------------------|
|
202 |
+
| LoRA Rank (`r`) | `16` |
|
203 |
+
| LoRA Alpha | `32` |
|
204 |
+
| LoRA Dropout | `0.05` |
|
205 |
+
| Target Modules | `all-linear` |
|
206 |
+
| Modules Saved | `lm_head`, `embed_token` |
|
207 |
+
|
208 |
+
#### Dataset
|
209 |
+
|
210 |
+
- **Train/Test Split:** `90%/10%`
|
211 |
+
- **Random Seed:** `19`
|
212 |
+
- **Train Batched:** `True`
|
213 |
+
- **Eval Batched:** `True`
|
214 |
+
|
215 |
+
#### Tokenizer Configuration
|
216 |
+
|
217 |
+
- **Truncation:** Enabled (`max_length=1024`)
|
218 |
+
- **Masked Language Modeling (MLM):** `False`
|
219 |
+
|
220 |
+
#### Speeds, Sizes, Times
|
221 |
+
|
222 |
+
- **Total Training Time:** ~11 hours
|
223 |
+
- **Checkpoint Frequency:** every `7200` steps
|
224 |
+
- **Checkpoint Steps:**
|
225 |
+
- `checkpoint-7200`
|
226 |
+
- `checkpoint-14400`
|
227 |
+
- `checkpoint-21600`
|
228 |
+
- `checkpoint-28800`
|
229 |
+
- `checkpoint-36000`
|
230 |
+
- `checkpoint-39600` *(final checkpoint)*
|
231 |
+
|
232 |
+
#### Compute Infrastructure
|
233 |
+
|
234 |
+
**Hardware:**
|
235 |
+
|
236 |
+
- GPU: **1 × NVIDIA L40S (48 GB VRAM)**
|
237 |
+
- RAM: **62 GB**
|
238 |
+
- CPU: **16 vCPU**
|
239 |
+
|
240 |
+
**Software:**
|
241 |
+
|
242 |
+
- OS: **Ubuntu 22.04**
|
243 |
+
- Framework: **PyTorch 2.4.0**
|
244 |
+
- CUDA Version: **12.4.1**
|
245 |
+
|
246 |
+
---
|
247 |
+
|
248 |
### Documentations
|
249 |
|
250 |
- [CONTRIBUTING](CONTRIBUTING.md)
|