BUT-FIT
/

CSTinyLlama-1.2B

Text Generation

text-generation-inference

Model card Files Files and versions

mfajcik commited on Mar 15, 2024

Commit

e8f8b0c

·

verified ·

1 Parent(s): 8f77bd2

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ Training was done on [Karolina](https://www.it4i.cz/en) cluster.
 # Loss
 Below we
 - (i) demonstrate the convergence speed of released model (`TINYLLAMA1.2B_cztokenizer64k_align1.7k_tllama1.1B_C2048_lr1e-04_150k`, at 160k step).
-- (ii) justify the contributions of our vocabulary swap method. We swap 1.7K tokens in this run, similarly as for our other models (see [Czech-GPT-2-XL-133k](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k)), by comparing the swapped model with model trained from scratch (using same hyperparameters) `scratch_cztokenizer64k_tllama1.1B_C2048_lr1e-04_150k`.
 ## Train Cross-Entropy
 <img src="figures/tllama_train.png" width="900"/>

 # Loss
 Below we
 - (i) demonstrate the convergence speed of released model (`TINYLLAMA1.2B_cztokenizer64k_align1.7k_tllama1.1B_C2048_lr1e-04_150k`, at 160k step).
+- (ii) justify the contributions of our vocabulary swap method by comparing the swapped model with model trained from scratch (using same hyperparameters) `scratch_cztokenizer64k_tllama1.1B_C2048_lr1e-04_150k`. We swap 1.7K tokens in this run, similarly as for our other models (see [Czech-GPT-2-XL-133k](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k))
 ## Train Cross-Entropy
 <img src="figures/tllama_train.png" width="900"/>