51fc6f7
3694cd0
51fc6f7
3694cd0
|
|
# Khmer‑English GPT‑2 Tokenizer
* **Vocab size:** 50,257
* **Algorithm:** Byte‑Level BPE (byte_fallback)
* **Special tokens:** <|endoftext|>, <|bos|>, <|pad|>, <|unk|>
* **Trained on:** `metythorn/khmer‑english‑corpus`
|