Update README.md
Browse files
README.md
CHANGED
|
@@ -135,6 +135,25 @@ This model is designed for:
|
|
| 135 |
3. Not intended for operational security decisions
|
| 136 |
4. Results should be interpreted with appropriate context
|
| 137 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 138 |
## Citation
|
| 139 |
```bibtex
|
| 140 |
@misc{conflllama,
|
|
|
|
| 135 |
3. Not intended for operational security decisions
|
| 136 |
4. Results should be interpreted with appropriate context
|
| 137 |
|
| 138 |
+
|
| 139 |
+
## Training Logs
|
| 140 |
+
<p align="center">
|
| 141 |
+
<img src="images/training-logs.png" alt="Training Logs" width="800"/>
|
| 142 |
+
</p>
|
| 143 |
+
|
| 144 |
+
The training logs show a successful training run with healthy convergence patterns:
|
| 145 |
+
|
| 146 |
+
**Loss & Learning Rate:**
|
| 147 |
+
- Loss decreases from 1.95 to ~0.90, with rapid initial improvement
|
| 148 |
+
- Learning rate uses warmup/decay schedule, peaking at ~1.5x10^-4
|
| 149 |
+
|
| 150 |
+
**Training Stability:**
|
| 151 |
+
- Stable gradient norms (0.4-0.6 range)
|
| 152 |
+
- Consistent GPU memory usage (~5800MB allocated, 7080MB reserved)
|
| 153 |
+
- Steady training speed (~3.5s/step) with brief interruption at step 800
|
| 154 |
+
|
| 155 |
+
The graphs indicate effective model training with good optimization dynamics and resource utilization. The loss vs. learning rate plot suggests optimal learning around 10^-4.
|
| 156 |
+
|
| 157 |
## Citation
|
| 158 |
```bibtex
|
| 159 |
@misc{conflllama,
|