madhavgohel
/

bert-token-onnx

Model card Files Files and versions

bert-token-onnx / README.md

madhavgohel's picture

Update README.md

73f9be3 verified 26 days ago

|

history blame contribute delete

2.33 kB

	---
	base_model:
	- google-bert/bert-base-uncased
	---
	# 🔍 BERT Token Classification – Important Chunk Extractor (ONNX)

	This model identifies and extracts important parts of input sentences using BERT-based token classification, exported to the ONNX format for optimized inference.
	---
	## 🧠 Use Case
	This model is designed for context engineering — to extract semantically important words or chunks from sentences or chat messages, enabling better personalization in downstream applications like AI assistants or dialogue systems.

	Example:
	```
	Input: I’ll be unavailable tomorrow due to a team offsite.

	Output: [unavailable, tomorrow, team offsite]
	```
	---

	## 🛠️ Model Details
	* Architecture: BERT (`bert-base-uncased`) fine-tuned for token classification
	* Exported to: ONNX for efficient runtime inference via [Optimum](https://huggingface.co/docs/optimum/onnxruntime)
	* Labels:

	label_list = ["O", "B-IMPORTANT", "I-IMPORTANT"]
	---

	## 📦 How to Use (with 🤗 Transformers + Optimum)
	```python
	from transformers import AutoTokenizer
	from optimum.onnxruntime import ORTModelForTokenClassification
	import torch

	model = ORTModelForTokenClassification.from_pretrained("madhavgohel/bert-token-onnx", file_name="model.onnx")
	tokenizer = AutoTokenizer.from_pretrained("madhavgohel/bert-token-onnx")

	text = "I'm a software engineer with 5 years experience looking to switch to a data science role."

	inputs = tokenizer(text, return_tensors="pt")
	outputs = model(**inputs)
	predictions = torch.argmax(outputs.logits, dim=-1)

	tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
	important_tokens = [tok for tok, label in zip(tokens, predictions[0]) if label == 1]
	print("Important tokens:", important_tokens)
	```
	---

	## 📁 Files Included
	\| File \| Purpose \|
	\| ------------------------- \| ----------------------------------- \|
	\| `model.onnx` \| Exported ONNX model \|
	\| `config.json` \| Model config \|
	\| `tokenizer_config.json` \| Tokenizer config \|
	\| `vocab.txt` \| Vocabulary for BERT tokenizer \|
	\| `special_tokens_map.json` \| Tokenization map for special tokens \|
	\| `README.md` \| Model usage documentation \|