Update README.md
Browse files
README.md
CHANGED
@@ -17,7 +17,7 @@ Training was done on [Karolina](https://www.it4i.cz/en) cluster.
|
|
17 |
# Loss
|
18 |
Below we
|
19 |
- (i) demonstrate the convergence speed of released model (`TINYLLAMA1.2B_cztokenizer64k_align1.7k_tllama1.1B_C2048_lr1e-04_150k`, at 160k step).
|
20 |
-
- (ii) justify the contributions of our vocabulary swap method. We swap 1.7K tokens in this run, similarly as for our other models (see [Czech-GPT-2-XL-133k](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k))
|
21 |
|
22 |
## Train Cross-Entropy
|
23 |
<img src="figures/tllama_train.png" width="900"/>
|
|
|
17 |
# Loss
|
18 |
Below we
|
19 |
- (i) demonstrate the convergence speed of released model (`TINYLLAMA1.2B_cztokenizer64k_align1.7k_tllama1.1B_C2048_lr1e-04_150k`, at 160k step).
|
20 |
+
- (ii) justify the contributions of our vocabulary swap method by comparing the swapped model with model trained from scratch (using same hyperparameters) `scratch_cztokenizer64k_tllama1.1B_C2048_lr1e-04_150k`. We swap 1.7K tokens in this run, similarly as for our other models (see [Czech-GPT-2-XL-133k](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k))
|
21 |
|
22 |
## Train Cross-Entropy
|
23 |
<img src="figures/tllama_train.png" width="900"/>
|