File size: 1,907 Bytes
44f4692 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
---
library_name: optimum.onnxruntime
tags:
- onnx
- int8
- quantization
- embeddings
- cpu
pipeline_tag: feature-extraction
license: apache-2.0
base_model: ibm-granite/granite-embedding-english-r2
---
# Granite Embedding English R2 — INT8 (ONNX)
This is the **INT8-quantized ONNX version** of [`ibm-granite/granite-embedding-english-r2`](https://huggingface.co/ibm-granite/granite-embedding-english-r2).
It is optimized to run efficiently on **CPU** using [🤗 Optimum](https://huggingface.co/docs/optimum) with ONNX Runtime.
- **Embedding dimension:** 768
- **Precision:** INT8 (dynamic quantization)
- **Backend:** ONNX Runtime
- **Use case:** text embeddings, semantic search, clustering, retrieval
---
## 📥 Installation
```bash
pip install -U transformers optimum[onnxruntime]
````
---
## 🚀 Usage
```python
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForFeatureExtraction
repo_id = "yasserrmd/granite-embedding-r2-onnx"
# Load tokenizer + ONNX model
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = ORTModelForFeatureExtraction.from_pretrained(repo_id)
# Encode sentences
inputs = tokenizer(["Hello world", "مرحباً"], padding=True, return_tensors="pt")
outputs = model(**inputs)
# Apply mean pooling over tokens
embeddings = outputs.last_hidden_state.mean(dim=1)
print(embeddings.shape) # (2, 768)
```
---
## ✅ Notes
* Quantization reduces model size and makes inference faster on CPUs while preserving accuracy.
* Pooling strategy here is **mean pooling**; you can adapt CLS pooling or max pooling as needed.
* Works seamlessly with **Hugging Face Hub** + `optimum.onnxruntime`.
---
## 📚 References
* [Original Granite Embedding English R2](https://huggingface.co/ibm-granite/granite-embedding-english-r2)
* [Optimum ONNX Runtime docs](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/models)
|