# Khmer‑English GPT‑2 Tokenizer | |
* **Vocab size:** 50,257 | |
* **Algorithm:** Byte‑Level BPE (byte_fallback) | |
* **Special tokens:** <|endoftext|>, <|bos|>, <|pad|>, <|unk|> | |
* **Trained on:** `metythorn/khmer‑english‑corpus` | |
# Khmer‑English GPT‑2 Tokenizer | |
* **Vocab size:** 50,257 | |
* **Algorithm:** Byte‑Level BPE (byte_fallback) | |
* **Special tokens:** <|endoftext|>, <|bos|>, <|pad|>, <|unk|> | |
* **Trained on:** `metythorn/khmer‑english‑corpus` | |