readme: correct and extend training section
Browse files
README.md
CHANGED
|
@@ -29,9 +29,11 @@ We use pretty much the same corpora as used for training the DBMDZ BERT model, t
|
|
| 29 |
|
| 30 |
Thanks to the awesome Hugging Face team, it is possible to create byte-level BPE with their awesome [Tokenizers](https://github.com/huggingface/tokenizers) library.
|
| 31 |
|
| 32 |
-
With the previously mentioned awesome Tokenizers library we created a
|
| 33 |
|
| 34 |
-
After creating the vocab, we could train the GPT-2 for German on
|
|
|
|
|
|
|
| 35 |
|
| 36 |
# Using the model
|
| 37 |
|
|
|
|
| 29 |
|
| 30 |
Thanks to the awesome Hugging Face team, it is possible to create byte-level BPE with their awesome [Tokenizers](https://github.com/huggingface/tokenizers) library.
|
| 31 |
|
| 32 |
+
With the previously mentioned awesome Tokenizers library we created a 50K byte-level BPE vocab based on the training corpora.
|
| 33 |
|
| 34 |
+
After creating the vocab, we could train the GPT-2 for German on a v3-8 TPU over the complete training corpus for 20 epochs. All hyperparameters
|
| 35 |
+
can be found in the official JAX/FLAX documentation [here](https://github.com/huggingface/transformers/blob/master/examples/flax/language-modeling/README.md)
|
| 36 |
+
from Transformers.
|
| 37 |
|
| 38 |
# Using the model
|
| 39 |
|