Add training details
Browse files
README.md
CHANGED
|
@@ -119,11 +119,12 @@ Please refer to [togethercomputer/RedPajama-Data-1T](https://huggingface.co/data
|
|
| 119 |
|
| 120 |
**Training Procedure**
|
| 121 |
|
| 122 |
-
- **Hardware:**
|
| 123 |
-
- **Optimizer:**
|
| 124 |
-
- **
|
|
|
|
| 125 |
- **Num of Tokens:** 800B Tokens
|
| 126 |
-
- **Learning rate:**
|
| 127 |
|
| 128 |
## Community
|
| 129 |
|
|
|
|
| 119 |
|
| 120 |
**Training Procedure**
|
| 121 |
|
| 122 |
+
- **Hardware:** 512 nodes of 6xV100 (IBM Power9), on the OLCF Summit cluster
|
| 123 |
+
- **Optimizer:** Apex FusedAdam
|
| 124 |
+
- **Parallelism:** Pipeline parallel 12, model parallel 2
|
| 125 |
+
- **Gradient Accumulations**: 8 (global batch size 4M tokens)
|
| 126 |
- **Num of Tokens:** 800B Tokens
|
| 127 |
+
- **Learning rate:** 0.00012
|
| 128 |
|
| 129 |
## Community
|
| 130 |
|