|
--- |
|
library_name: optimum.onnxruntime |
|
tags: |
|
- onnx |
|
- int8 |
|
- quantization |
|
- embeddings |
|
- cpu |
|
pipeline_tag: feature-extraction |
|
license: apache-2.0 |
|
base_model: ibm-granite/granite-embedding-english-r2 |
|
--- |
|
|
|
# Granite Embedding English R2 — INT8 (ONNX) |
|
|
|
This is the **INT8-quantized ONNX version** of [`ibm-granite/granite-embedding-english-r2`](https://huggingface.co/ibm-granite/granite-embedding-english-r2). |
|
It is optimized to run efficiently on **CPU** using [🤗 Optimum](https://huggingface.co/docs/optimum) with ONNX Runtime. |
|
|
|
- **Embedding dimension:** 768 |
|
- **Precision:** INT8 (dynamic quantization) |
|
- **Backend:** ONNX Runtime |
|
- **Use case:** text embeddings, semantic search, clustering, retrieval |
|
|
|
--- |
|
|
|
## 📥 Installation |
|
|
|
```bash |
|
pip install -U transformers optimum[onnxruntime] |
|
```` |
|
|
|
--- |
|
|
|
## 🚀 Usage |
|
|
|
```python |
|
from transformers import AutoTokenizer |
|
from optimum.onnxruntime import ORTModelForFeatureExtraction |
|
|
|
repo_id = "yasserrmd/granite-embedding-r2-onnx" |
|
|
|
# Load tokenizer + ONNX model |
|
tokenizer = AutoTokenizer.from_pretrained(repo_id) |
|
model = ORTModelForFeatureExtraction.from_pretrained(repo_id) |
|
|
|
# Encode sentences |
|
inputs = tokenizer(["Hello world", "مرحباً"], padding=True, return_tensors="pt") |
|
outputs = model(**inputs) |
|
|
|
# Apply mean pooling over tokens |
|
embeddings = outputs.last_hidden_state.mean(dim=1) |
|
print(embeddings.shape) # (2, 768) |
|
``` |
|
|
|
--- |
|
|
|
## ✅ Notes |
|
|
|
* Quantization reduces model size and makes inference faster on CPUs while preserving accuracy. |
|
* Pooling strategy here is **mean pooling**; you can adapt CLS pooling or max pooling as needed. |
|
* Works seamlessly with **Hugging Face Hub** + `optimum.onnxruntime`. |
|
|
|
--- |
|
|
|
## 📚 References |
|
|
|
* [Original Granite Embedding English R2](https://huggingface.co/ibm-granite/granite-embedding-english-r2) |
|
* [Optimum ONNX Runtime docs](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/models) |
|
|
|
|