You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

No Name Thai NER

mascot
Looloo Health Prescribe

Compact Thai token-classification model optimized for fast named-entity recognition (NER) and practical medical-text deidentification. This checkpoint was trained for robust entity detection on Thai clinical and conversational text and is intended for use in context-preserving anonymization pipelines.

At Looloo Health, we're passionate about making healthcare more accessible and affordable for everyone. The model is a core component of our AI Medical Scribe, PresScribe, where it helps ensure patient privacy through automated de-identification. We believe that unlocking the potential of clinical data is key to this goal, and we're excited to share our work with the community.

Features

  • Detects common sensitive entity types found in medical text (names, phone numbers, IDs, addresses, dates, etc.).
  • Lightweight and fast to run on CPUs with the Hugging Face transformers pipeline.
  • Designed to be used as part of a deidentification workflow (post-processing recommended to merge token-level spans).
  • Trained on a comprehensive synthetic dataset of over 300,000 samples, ensuring it is robust and generalizable.
  • On our internal test set, we achieved over 95% accuracy for our specific use case.

Supported entity labels

  • PERSON
  • PHONE
  • EMAIL
  • ADDRESS (sometimes labelled as LOCATION)
  • DATE
  • NATIONAL_ID
  • HOSPITAL_IDS

Quick start

Install minimal dependencies:

pip install -U transformers torch

Load and run the model with Hugging Face pipelines:

from transformers import pipeline

ner = pipeline("token-classification", model="loolootech/no-name-ner-th", device=-1)
text = "คุณสมชายเป็นอะไรมาครับวันนี้ อ๋อวันนี้ปวดตับครับ งั้นวันนี้หมอขอตรวจละเอียดหน่อยนะ ได้เลยครับน้องมาร์ค"
results = ner(text)
print(results)

Notes on post-processing (more details on our example notebook)

  • The pipeline returns token-level predictions (B-/I- style). For redaction or anonymization you should merge adjacent tokens with the same label to form full spans before replacing with entity-specific redaction tokens (e.g. [PERSON], [PHONE]).
  • When redacting, replace spans from right-to-left or rebuild the output string from slices to avoid offset shifts.

Disclaimer

  • This model is intended as an assistive tool for de-identification. It is not a substitute for professional, legal, or medical advice.

  • Users are fully responsible for ensuring compliance with applicable privacy, legal, and regulatory requirements.

  • While efforts have been made to improve accuracy, no automated system is 100% reliable. We strongly recommend implementing a regular human review process to validate outputs.

License

This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).

Citation

If you use the model, you can cite it with the following bibtex.

@misc {no_name_ner_th,
    author       = { Atirut Boribalburephan, Chiraphat Boonnag, Knot Pipatsrisawat },
    title        = { no-name-ner-th },
    year         = 2025,
    url          = { https://huggingface.co/loolootech/no-name-ner-th },
    publisher    = { Hugging Face }
}

Acknowledgement

We extend our gratitude to the PhayaThaiBERT team and Pavarissy/phayathaibert-thainer for providing the initial checkpoint for our model, which served as a crucial starting point. We also acknowledge PyThaiNLP for their invaluable contribution of the thainer-corpus-v2 dataset, which was essential for training and evaluation.

Downloads last month
42
Safetensors
Model size
277M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for loolootech/no-name-ner-th

Finetuned
(10)
this model

Dataset used to train loolootech/no-name-ner-th

Space using loolootech/no-name-ner-th 1