Update README.md
Browse files
README.md
CHANGED
|
@@ -23,7 +23,7 @@ GPT-Neo 1.3B was trained on the Pile, a large scale curated dataset created by E
|
|
| 23 |
|
| 24 |
## Training procedure
|
| 25 |
|
| 26 |
-
This model was trained for 380,000 steps
|
| 27 |
|
| 28 |
## Intended Use and Limitations
|
| 29 |
|
|
|
|
| 23 |
|
| 24 |
## Training procedure
|
| 25 |
|
| 26 |
+
This model was trained on the Pile for 380 billion tokens over 362,000 steps. It was trained as a masked autoregressive language model, using cross-entropy loss.
|
| 27 |
|
| 28 |
## Intended Use and Limitations
|
| 29 |
|