gpt2-tokenizer / README.md
metythorn's picture
Add model card
3694cd0 verified
# Khmer‑English GPT‑2 Tokenizer
* **Vocab size:** 50,257
* **Algorithm:** Byte‑Level BPE (byte_fallback)
* **Special tokens:** <|endoftext|>, <|bos|>, <|pad|>, <|unk|>
* **Trained on:** `metythorn/khmer‑english‑corpus`