mosaicml
/

mpt-7b-chat

Text Generation

text-generation-inference

Model card Files Files and versions

sam-mosaic commited on Jun 14, 2023

Commit

6efab79

·

1 Parent(s): c8d4750

Update README.md

Files changed (1) hide show

README.md +5 -0

README.md CHANGED Viewed

@@ -138,6 +138,11 @@ The model has been modified from a standard transformer in the following ways:
 | vocab size | 50432 |
 | sequence length | 2048 |
 ## Limitations and Biases
 _The following language is modified from [EleutherAI's GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b)_

 | vocab size | 50432 |
 | sequence length | 2048 |
+### Training Configuration
+This model was trained on 8 A100-80GBs for about 8.2 hours, followed by training for 6.7 hours on 32 A100-40GBs using the [MosaicML Platform](https://www.mosaicml.com/platform).
+The model was trained with sharded data parallelism using [FSDP](https://pytorch.org/docs/stable/fsdp.html) and used the AdamW optimizer.
 ## Limitations and Biases
 _The following language is modified from [EleutherAI's GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b)_