File size: 232 Bytes
51fc6f7
3694cd0
51fc6f7
3694cd0
 
 
 
1
2
3
4
5
6
7
8

# Khmer‑English GPT‑2 Tokenizer

* **Vocab size:** 50,257
* **Algorithm:** Byte‑Level BPE (byte_fallback)
* **Special tokens:** <|endoftext|>, <|bos|>, <|pad|>, <|unk|>
* **Trained on:** `metythorn/khmer‑english‑corpus`