Update README.md
Browse files
README.md
CHANGED
|
@@ -1,13 +1,13 @@
|
|
| 1 |
---
|
| 2 |
library_name: transformers
|
| 3 |
datasets:
|
| 4 |
-
- DataSeer/si-summarization-votes-r1-081725
|
| 5 |
base_model: Qwen/Qwen3-32B
|
| 6 |
tags:
|
| 7 |
-
- lora
|
| 8 |
-
- supervised-fine-tuning
|
| 9 |
-
- summarization
|
| 10 |
-
- qwen3
|
| 11 |
---
|
| 12 |
|
| 13 |
# Qwen3-32B Summarization LoRA Adapter
|
|
@@ -45,8 +45,8 @@ The model was trained on the `DataSeer/si-summarization-votes-r1-081725` dataset
|
|
| 45 |
|
| 46 |
### Training Configuration
|
| 47 |
|
| 48 |
-
- **Training epochs:**
|
| 49 |
-
- **Learning rate:** 1e-
|
| 50 |
- **Batch size:** 1 per device
|
| 51 |
- **Gradient accumulation steps:** 8
|
| 52 |
- **Effective batch size:** 8
|
|
@@ -58,17 +58,17 @@ The model was trained on the `DataSeer/si-summarization-votes-r1-081725` dataset
|
|
| 58 |
|
| 59 |
### Training Results
|
| 60 |
|
| 61 |
-
- **Final training loss:** 0.
|
| 62 |
-
- **Mean token accuracy:**
|
| 63 |
-
- **Total training steps:**
|
| 64 |
-
- **Training runtime:**
|
| 65 |
- **Training samples per second:** 0.216
|
| 66 |
-
- **Final learning rate:**
|
| 67 |
|
| 68 |
### Hardware & Performance
|
| 69 |
|
| 70 |
- **Hardware:** 8x NVIDIA H100 80GB HBM3
|
| 71 |
-
- **Training time:** ~
|
| 72 |
- **Memory optimization:** Gradient checkpointing, bfloat16 precision
|
| 73 |
|
| 74 |
## Usage
|
|
@@ -91,6 +91,6 @@ tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-32B")
|
|
| 91 |
model = PeftModel.from_pretrained(base_model, "path/to/adapter")
|
| 92 |
```
|
| 93 |
|
| 94 |
-
|
| 95 |
### Environmental Impact
|
| 96 |
-
|
|
|
|
|
|
| 1 |
---
|
| 2 |
library_name: transformers
|
| 3 |
datasets:
|
| 4 |
+
- DataSeer/si-summarization-votes-r1-081725
|
| 5 |
base_model: Qwen/Qwen3-32B
|
| 6 |
tags:
|
| 7 |
+
- lora
|
| 8 |
+
- supervised-fine-tuning
|
| 9 |
+
- summarization
|
| 10 |
+
- qwen3
|
| 11 |
---
|
| 12 |
|
| 13 |
# Qwen3-32B Summarization LoRA Adapter
|
|
|
|
| 45 |
|
| 46 |
### Training Configuration
|
| 47 |
|
| 48 |
+
- **Training epochs:** 3
|
| 49 |
+
- **Learning rate:** 1e-5 (0.00001)
|
| 50 |
- **Batch size:** 1 per device
|
| 51 |
- **Gradient accumulation steps:** 8
|
| 52 |
- **Effective batch size:** 8
|
|
|
|
| 58 |
|
| 59 |
### Training Results
|
| 60 |
|
| 61 |
+
- **Final training loss:** 0.5931
|
| 62 |
+
- **Mean token accuracy:** 84.41%
|
| 63 |
+
- **Total training steps:** 93
|
| 64 |
+
- **Training runtime:** 56.6 minutes (3,398 seconds)
|
| 65 |
- **Training samples per second:** 0.216
|
| 66 |
+
- **Final learning rate:** 4.56e-8
|
| 67 |
|
| 68 |
### Hardware & Performance
|
| 69 |
|
| 70 |
- **Hardware:** 8x NVIDIA H100 80GB HBM3
|
| 71 |
+
- **Training time:** ~57 minutes
|
| 72 |
- **Memory optimization:** Gradient checkpointing, bfloat16 precision
|
| 73 |
|
| 74 |
## Usage
|
|
|
|
| 91 |
model = PeftModel.from_pretrained(base_model, "path/to/adapter")
|
| 92 |
```
|
| 93 |
|
|
|
|
| 94 |
### Environmental Impact
|
| 95 |
+
|
| 96 |
+
Training was conducted on high-performance H100 GPUs for approximately 57 minutes, representing a relatively efficient fine-tuning process thanks to the LoRA approach which only trains ~0.1% of the total model parameters.
|