gpt2-tokenizer / README.md
metythorn's picture
Add model card
3694cd0 verified

Khmer‑English GPT‑2 Tokenizer

  • Vocab size: 50,257
  • Algorithm: Byte‑Level BPE (byte_fallback)
  • Special tokens: <|endoftext|>, <|bos|>, <|pad|>, <|unk|>
  • Trained on: metythorn/khmer‑english‑corpus