model-attribution-challenge
/

german-gpt2

@@ -29,9 +29,11 @@ We use pretty much the same corpora as used for training the DBMDZ BERT model, t
 Thanks to the awesome Hugging Face team, it is possible to create byte-level BPE with their awesome [Tokenizers](https://github.com/huggingface/tokenizers) library.
-With the previously mentioned awesome Tokenizers library we created a 52K byte-level BPE vocab based on the training corpora.
-After creating the vocab, we could train the GPT-2 for German on one TPU over the complete training corpus (three epochs).
 # Using the model

 Thanks to the awesome Hugging Face team, it is possible to create byte-level BPE with their awesome [Tokenizers](https://github.com/huggingface/tokenizers) library.
+With the previously mentioned awesome Tokenizers library we created a 50K byte-level BPE vocab based on the training corpora.
+After creating the vocab, we could train the GPT-2 for German on a v3-8 TPU over the complete training corpus for 20 epochs. All hyperparameters
+can be found in the official JAX/FLAX documentation [here](https://github.com/huggingface/transformers/blob/master/examples/flax/language-modeling/README.md)
+from Transformers.
 # Using the model