|
--- |
|
base_model: |
|
- google-bert/bert-base-uncased |
|
--- |
|
# 🔍 BERT Token Classification – Important Chunk Extractor (ONNX) |
|
|
|
This model identifies and extracts important parts of input sentences using BERT-based token classification, exported to the ONNX format for optimized inference. |
|
--- |
|
## 🧠 Use Case |
|
This model is designed for **context engineering** — to extract semantically important words or chunks from sentences or chat messages, enabling better personalization in downstream applications like AI assistants or dialogue systems. |
|
|
|
Example: |
|
``` |
|
Input: I’ll be unavailable tomorrow due to a team offsite. |
|
|
|
Output: [unavailable, tomorrow, team offsite] |
|
``` |
|
--- |
|
|
|
## 🛠️ Model Details |
|
* **Architecture**: BERT (`bert-base-uncased`) fine-tuned for token classification |
|
* **Exported to**: ONNX for efficient runtime inference via [Optimum](https://huggingface.co/docs/optimum/onnxruntime) |
|
* **Labels**: |
|
|
|
label_list = ["O", "B-IMPORTANT", "I-IMPORTANT"] |
|
--- |
|
|
|
## 📦 How to Use (with 🤗 Transformers + Optimum) |
|
```python |
|
from transformers import AutoTokenizer |
|
from optimum.onnxruntime import ORTModelForTokenClassification |
|
import torch |
|
|
|
model = ORTModelForTokenClassification.from_pretrained("madhavgohel/bert-token-onnx", file_name="model.onnx") |
|
tokenizer = AutoTokenizer.from_pretrained("madhavgohel/bert-token-onnx") |
|
|
|
text = "I'm a software engineer with 5 years experience looking to switch to a data science role." |
|
|
|
inputs = tokenizer(text, return_tensors="pt") |
|
outputs = model(**inputs) |
|
predictions = torch.argmax(outputs.logits, dim=-1) |
|
|
|
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0]) |
|
important_tokens = [tok for tok, label in zip(tokens, predictions[0]) if label == 1] |
|
print("Important tokens:", important_tokens) |
|
``` |
|
--- |
|
|
|
## 📁 Files Included |
|
| File | Purpose | |
|
| ------------------------- | ----------------------------------- | |
|
| `model.onnx` | Exported ONNX model | |
|
| `config.json` | Model config | |
|
| `tokenizer_config.json` | Tokenizer config | |
|
| `vocab.txt` | Vocabulary for BERT tokenizer | |
|
| `special_tokens_map.json` | Tokenization map for special tokens | |
|
| `README.md` | Model usage documentation | |
|
|