Update model description
Browse files
README.md
CHANGED
|
@@ -6,6 +6,11 @@ This model contains just the `IPUConfig` files for running the [gpt2](https://hu
|
|
| 6 |
|
| 7 |
**This model contains no model weights, only an IPUConfig.**
|
| 8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
## Usage
|
| 10 |
|
| 11 |
```
|
|
|
|
| 6 |
|
| 7 |
**This model contains no model weights, only an IPUConfig.**
|
| 8 |
|
| 9 |
+
## Model description
|
| 10 |
+
GPT2 is a large transformer-based language model. It is built using transformer decoder blocks. BERT, on the other hand, uses transformer encoder blocks. It adds Layer normalisation to the input of each sub-block, similar to a pre-activation residual networks and an additional layer normalisation.
|
| 11 |
+
|
| 12 |
+
Paper link : [Language Models are Unsupervised Multitask Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf)
|
| 13 |
+
|
| 14 |
## Usage
|
| 15 |
|
| 16 |
```
|