parthsarin commited on
Commit
4587836
·
verified ·
1 Parent(s): 5714001

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -15
README.md CHANGED
@@ -1,13 +1,13 @@
1
  ---
2
  library_name: transformers
3
  datasets:
4
- - DataSeer/si-summarization-votes-r1-081725
5
  base_model: Qwen/Qwen3-32B
6
  tags:
7
- - lora
8
- - supervised-fine-tuning
9
- - summarization
10
- - qwen3
11
  ---
12
 
13
  # Qwen3-32B Summarization LoRA Adapter
@@ -45,8 +45,8 @@ The model was trained on the `DataSeer/si-summarization-votes-r1-081725` dataset
45
 
46
  ### Training Configuration
47
 
48
- - **Training epochs:** 2
49
- - **Learning rate:** 1e-3 (0.001)
50
  - **Batch size:** 1 per device
51
  - **Gradient accumulation steps:** 8
52
  - **Effective batch size:** 8
@@ -58,17 +58,17 @@ The model was trained on the `DataSeer/si-summarization-votes-r1-081725` dataset
58
 
59
  ### Training Results
60
 
61
- - **Final training loss:** 0.3414
62
- - **Mean token accuracy:** 88.13%
63
- - **Total training steps:** 62
64
- - **Training runtime:** 37.9 minutes (2,273 seconds)
65
  - **Training samples per second:** 0.216
66
- - **Final learning rate:** 5.77e-6
67
 
68
  ### Hardware & Performance
69
 
70
  - **Hardware:** 8x NVIDIA H100 80GB HBM3
71
- - **Training time:** ~38 minutes
72
  - **Memory optimization:** Gradient checkpointing, bfloat16 precision
73
 
74
  ## Usage
@@ -91,6 +91,6 @@ tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-32B")
91
  model = PeftModel.from_pretrained(base_model, "path/to/adapter")
92
  ```
93
 
94
-
95
  ### Environmental Impact
96
- Training was conducted on high-performance H100 GPUs for approximately 38 minutes, representing a relatively efficient fine-tuning process thanks to the LoRA approach which only trains ~0.1% of the total model parameters.
 
 
1
  ---
2
  library_name: transformers
3
  datasets:
4
+ - DataSeer/si-summarization-votes-r1-081725
5
  base_model: Qwen/Qwen3-32B
6
  tags:
7
+ - lora
8
+ - supervised-fine-tuning
9
+ - summarization
10
+ - qwen3
11
  ---
12
 
13
  # Qwen3-32B Summarization LoRA Adapter
 
45
 
46
  ### Training Configuration
47
 
48
+ - **Training epochs:** 3
49
+ - **Learning rate:** 1e-5 (0.00001)
50
  - **Batch size:** 1 per device
51
  - **Gradient accumulation steps:** 8
52
  - **Effective batch size:** 8
 
58
 
59
  ### Training Results
60
 
61
+ - **Final training loss:** 0.5931
62
+ - **Mean token accuracy:** 84.41%
63
+ - **Total training steps:** 93
64
+ - **Training runtime:** 56.6 minutes (3,398 seconds)
65
  - **Training samples per second:** 0.216
66
+ - **Final learning rate:** 4.56e-8
67
 
68
  ### Hardware & Performance
69
 
70
  - **Hardware:** 8x NVIDIA H100 80GB HBM3
71
+ - **Training time:** ~57 minutes
72
  - **Memory optimization:** Gradient checkpointing, bfloat16 precision
73
 
74
  ## Usage
 
91
  model = PeftModel.from_pretrained(base_model, "path/to/adapter")
92
  ```
93
 
 
94
  ### Environmental Impact
95
+
96
+ Training was conducted on high-performance H100 GPUs for approximately 57 minutes, representing a relatively efficient fine-tuning process thanks to the LoRA approach which only trains ~0.1% of the total model parameters.