No Name Thai NER



Compact Thai token-classification model optimized for fast named-entity recognition (NER) and practical medical-text deidentification. This checkpoint was trained for robust entity detection on Thai clinical and conversational text and is intended for use in context-preserving anonymization pipelines.
At Looloo Health, we're passionate about making healthcare more accessible and affordable for everyone. The model is a core component of our AI Medical Scribe, PresScribe, where it helps ensure patient privacy through automated de-identification. We believe that unlocking the potential of clinical data is key to this goal, and we're excited to share our work with the community.
Features
- Detects common sensitive entity types found in medical text (names, phone numbers, IDs, addresses, dates, etc.).
- Lightweight and fast to run on CPUs with the Hugging Face
transformers
pipeline. - Designed to be used as part of a deidentification workflow (post-processing recommended to merge token-level spans).
- Trained on a comprehensive synthetic dataset of over 300,000 samples, ensuring it is robust and generalizable.
- On our internal test set, we achieved over 95% accuracy for our specific use case.
Supported entity labels
- PERSON
- PHONE
- ADDRESS (sometimes labelled as LOCATION)
- DATE
- NATIONAL_ID
- HOSPITAL_IDS
Quick start
Install minimal dependencies:
pip install -U transformers torch
Load and run the model with Hugging Face pipelines:
from transformers import pipeline
ner = pipeline("token-classification", model="loolootech/no-name-ner-th", device=-1)
text = "คุณสมชายเป็นอะไรมาครับวันนี้ อ๋อวันนี้ปวดตับครับ งั้นวันนี้หมอขอตรวจละเอียดหน่อยนะ ได้เลยครับน้องมาร์ค"
results = ner(text)
print(results)
Notes on post-processing (more details on our example notebook)
- The pipeline returns token-level predictions (B-/I- style). For redaction or anonymization you should merge adjacent tokens with the same label to form full spans before replacing with entity-specific redaction tokens (e.g. [PERSON], [PHONE]).
- When redacting, replace spans from right-to-left or rebuild the output string from slices to avoid offset shifts.
Disclaimer
This model is intended as an assistive tool for de-identification. It is not a substitute for professional, legal, or medical advice.
Users are fully responsible for ensuring compliance with applicable privacy, legal, and regulatory requirements.
While efforts have been made to improve accuracy, no automated system is 100% reliable. We strongly recommend implementing a regular human review process to validate outputs.
License
This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
- For commercial usage, please contact [email protected].
Citation
If you use the model, you can cite it with the following bibtex.
@misc {no_name_ner_th,
author = { Atirut Boribalburephan, Chiraphat Boonnag, Knot Pipatsrisawat },
title = { no-name-ner-th },
year = 2025,
url = { https://huggingface.co/loolootech/no-name-ner-th },
publisher = { Hugging Face }
}
Acknowledgement
We extend our gratitude to the PhayaThaiBERT
team and Pavarissy/phayathaibert-thainer
for providing the initial checkpoint for our model, which served as a crucial starting point. We also acknowledge PyThaiNLP for their invaluable contribution of the thainer-corpus-v2
dataset, which was essential for training and evaluation.
- Downloads last month
- 42
Model tree for loolootech/no-name-ner-th
Base model
clicknext/phayathaibert