bunyaminergen commited on
Commit
94d91fc
·
1 Parent(s): 756e8dd
Files changed (1) hide show
  1. README.md +76 -0
README.md CHANGED
@@ -34,6 +34,7 @@ please read the [CONTRIBUTING](CONTRIBUTING.md) first._
34
  - [Usage](#usage)
35
  - [Comparison](#comparison)
36
  - [Dataset](#dataset)
 
37
  - [Documentations](#documentations)
38
  - [License](#licence)
39
  - [Links](#links)
@@ -169,6 +170,81 @@ This implementation includes error handling and examples for usage.
169
 
170
  ---
171
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
172
  ### Documentations
173
 
174
  - [CONTRIBUTING](CONTRIBUTING.md)
 
34
  - [Usage](#usage)
35
  - [Comparison](#comparison)
36
  - [Dataset](#dataset)
37
+ - [Training](#training)
38
  - [Documentations](#documentations)
39
  - [License](#licence)
40
  - [Links](#links)
 
170
 
171
  ---
172
 
173
+ ### Training
174
+
175
+ #### Hyperparameters
176
+
177
+ | Hyperparameter | Value |
178
+ |-----------------------------|---------------------------------------|
179
+ | Base Model | `Qwen/Qwen2.5-Coder-1.5B-Instruct` |
180
+ | Fine-tuning Method | QLoRA (Quantized Low-Rank Adaptation) |
181
+ | Task Type | `CAUSAL_LM` |
182
+ | Number of Epochs | `11` |
183
+ | Batch Size | `8` |
184
+ | Gradient Accumulation Steps | `2` |
185
+ | Effective Batch Size | `16` (8 × 2) |
186
+ | Learning Rate | `1e-4` |
187
+ | Optimizer | `AdamW` |
188
+ | Precision | `BF16 Mixed Precision` |
189
+ | Evaluation Strategy | `None` |
190
+ | Max Sequence Length | `1024 tokens` |
191
+ | Logging Steps | every `1000` steps |
192
+ | Save Checkpoint Steps | every `7200` steps |
193
+ | Output Directory | Overwritten per run |
194
+ | Experiment Tracking | `MLflow` (local tracking) |
195
+ | Experiment Name | `AssistantFineTuning` |
196
+ | MLflow Run Name | `AssistantFT` |
197
+
198
+ #### PEFT (QLoRA) Configuration
199
+
200
+ | Parameter | Value |
201
+ |-----------------|--------------------------|
202
+ | LoRA Rank (`r`) | `16` |
203
+ | LoRA Alpha | `32` |
204
+ | LoRA Dropout | `0.05` |
205
+ | Target Modules | `all-linear` |
206
+ | Modules Saved | `lm_head`, `embed_token` |
207
+
208
+ #### Dataset
209
+
210
+ - **Train/Test Split:** `90%/10%`
211
+ - **Random Seed:** `19`
212
+ - **Train Batched:** `True`
213
+ - **Eval Batched:** `True`
214
+
215
+ #### Tokenizer Configuration
216
+
217
+ - **Truncation:** Enabled (`max_length=1024`)
218
+ - **Masked Language Modeling (MLM):** `False`
219
+
220
+ #### Speeds, Sizes, Times
221
+
222
+ - **Total Training Time:** ~11 hours
223
+ - **Checkpoint Frequency:** every `7200` steps
224
+ - **Checkpoint Steps:**
225
+ - `checkpoint-7200`
226
+ - `checkpoint-14400`
227
+ - `checkpoint-21600`
228
+ - `checkpoint-28800`
229
+ - `checkpoint-36000`
230
+ - `checkpoint-39600` *(final checkpoint)*
231
+
232
+ #### Compute Infrastructure
233
+
234
+ **Hardware:**
235
+
236
+ - GPU: **1 × NVIDIA L40S (48 GB VRAM)**
237
+ - RAM: **62 GB**
238
+ - CPU: **16 vCPU**
239
+
240
+ **Software:**
241
+
242
+ - OS: **Ubuntu 22.04**
243
+ - Framework: **PyTorch 2.4.0**
244
+ - CUDA Version: **12.4.1**
245
+
246
+ ---
247
+
248
  ### Documentations
249
 
250
  - [CONTRIBUTING](CONTRIBUTING.md)