Text Generation
Transformers
Safetensors
Czech
llama
text-generation-inference
mfajcik commited on
Commit
e8f8b0c
·
verified ·
1 Parent(s): 8f77bd2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -17,7 +17,7 @@ Training was done on [Karolina](https://www.it4i.cz/en) cluster.
17
  # Loss
18
  Below we
19
  - (i) demonstrate the convergence speed of released model (`TINYLLAMA1.2B_cztokenizer64k_align1.7k_tllama1.1B_C2048_lr1e-04_150k`, at 160k step).
20
- - (ii) justify the contributions of our vocabulary swap method. We swap 1.7K tokens in this run, similarly as for our other models (see [Czech-GPT-2-XL-133k](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k)), by comparing the swapped model with model trained from scratch (using same hyperparameters) `scratch_cztokenizer64k_tllama1.1B_C2048_lr1e-04_150k`.
21
 
22
  ## Train Cross-Entropy
23
  <img src="figures/tllama_train.png" width="900"/>
 
17
  # Loss
18
  Below we
19
  - (i) demonstrate the convergence speed of released model (`TINYLLAMA1.2B_cztokenizer64k_align1.7k_tllama1.1B_C2048_lr1e-04_150k`, at 160k step).
20
+ - (ii) justify the contributions of our vocabulary swap method by comparing the swapped model with model trained from scratch (using same hyperparameters) `scratch_cztokenizer64k_tllama1.1B_C2048_lr1e-04_150k`. We swap 1.7K tokens in this run, similarly as for our other models (see [Czech-GPT-2-XL-133k](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k))
21
 
22
  ## Train Cross-Entropy
23
  <img src="figures/tllama_train.png" width="900"/>