Commit
·
6efab79
1
Parent(s):
c8d4750
Update README.md
Browse files
README.md
CHANGED
|
@@ -138,6 +138,11 @@ The model has been modified from a standard transformer in the following ways:
|
|
| 138 |
| vocab size | 50432 |
|
| 139 |
| sequence length | 2048 |
|
| 140 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 141 |
## Limitations and Biases
|
| 142 |
|
| 143 |
_The following language is modified from [EleutherAI's GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b)_
|
|
|
|
| 138 |
| vocab size | 50432 |
|
| 139 |
| sequence length | 2048 |
|
| 140 |
|
| 141 |
+
### Training Configuration
|
| 142 |
+
|
| 143 |
+
This model was trained on 8 A100-80GBs for about 8.2 hours, followed by training for 6.7 hours on 32 A100-40GBs using the [MosaicML Platform](https://www.mosaicml.com/platform).
|
| 144 |
+
The model was trained with sharded data parallelism using [FSDP](https://pytorch.org/docs/stable/fsdp.html) and used the AdamW optimizer.
|
| 145 |
+
|
| 146 |
## Limitations and Biases
|
| 147 |
|
| 148 |
_The following language is modified from [EleutherAI's GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b)_
|