Update README.md
Browse files
README.md
CHANGED
|
@@ -136,15 +136,16 @@ The NT-Java-1.1B model has been trained on publicly available datasets and is of
|
|
| 136 |
|
| 137 |
## Model
|
| 138 |
|
| 139 |
-
- **Architecture:** GPT-2 model with Multi-Query Attention and Fill-in-the-Middle objective
|
| 140 |
-
- **Pretraining steps:**
|
| 141 |
-
- **
|
|
|
|
| 142 |
- **Precision:** bfloat16
|
| 143 |
|
| 144 |
## Hardware
|
| 145 |
|
| 146 |
- **GPUs:** 6 NVIDIA A100 80GB
|
| 147 |
-
- **Training time:**
|
| 148 |
|
| 149 |
## Software
|
| 150 |
|
|
|
|
| 136 |
|
| 137 |
## Model
|
| 138 |
|
| 139 |
+
- **Architecture:** GPT-2 model with Multi-Query Attention and Fill-in-the-Middle objective.
|
| 140 |
+
- **Pretraining steps:** 100k
|
| 141 |
+
- **Context length:** 8K tokens
|
| 142 |
+
- **Pretraining tokens:** 22 billion
|
| 143 |
- **Precision:** bfloat16
|
| 144 |
|
| 145 |
## Hardware
|
| 146 |
|
| 147 |
- **GPUs:** 6 NVIDIA A100 80GB
|
| 148 |
+
- **Training time:** 10 days
|
| 149 |
|
| 150 |
## Software
|
| 151 |
|