Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
metythorn
/
gpt2-tokenizer
like
0
Model card
Files
Files and versions
xet
Community
main
gpt2-tokenizer
/
README.md
metythorn
Add model card
3694cd0
verified
about 1 month ago
preview
code
|
raw
Copy download link
history
blame
contribute
delete
232 Bytes
Khmer‑English GPT‑2 Tokenizer
Vocab size:
50,257
Algorithm:
Byte‑Level BPE (byte_fallback)
Special tokens:
<|endoftext|>, <|bos|>, <|pad|>, <|unk|>
Trained on:
metythorn/khmer‑english‑corpus