|
--- |
|
license: apache-2.0 |
|
base_model: |
|
- nomic-ai/nomic-embed-text-v1.5 |
|
pipeline_tag: sentence-similarity |
|
--- |
|
# Nomic Embed Text V1 (ONNX) |
|
|
|
**Tags:** `text-embedding` `onnx` `nomic-embed-text` `sentence-transformers` |
|
|
|
--- |
|
|
|
## Model Details |
|
|
|
- **Model Name:** Nomic Embed Text V1 (ONNX export) |
|
- **Original HF Repo:** [nomic-ai/nomic-embed-text-v1](https://huggingface.co/nomic-ai/nomic-embed-text-v1) |
|
- **ONNX File:** `model.onnx` |
|
- **Export Date:** 2025-05-27 |
|
|
|
This model outputs: |
|
1. **token_embeddings** — per‐token embedding vectors (`[batch_size, seq_len, hidden_size]`) |
|
2. **sentence_embedding** — pooled sentence‐level embeddings (`[batch_size, hidden_size]`) |
|
|
|
--- |
|
|
|
## Model Description |
|
|
|
Nomic Embed Text V1 is a BERT‐style encoder trained to generate high-quality dense representations of text. It is suitable for: |
|
|
|
- Semantic search |
|
- Text clustering |
|
- Recommendation systems |
|
- Downstream classification |
|
|
|
The ONNX export ensures compatibility with inference engines like [ONNX Runtime](https://www.onnxruntime.ai/) and NVIDIA Triton Inference Server. |
|
|
|
--- |
|
|
|
## Usage |
|
|
|
### 1. Install Dependencies |
|
|
|
```bash |
|
pip install onnxruntime transformers numpy |
|
``` |
|
|
|
### 2. Install Dependencies |
|
|
|
```python |
|
import onnxruntime as ort |
|
|
|
session = ort.InferenceSession("model.onnx") |
|
``` |
|
|
|
### 3. Tokenize Inputs |
|
|
|
```python |
|
from transformers import AutoTokenizer |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("nomic-ai/nomic-embed-text-v1") |
|
inputs = tokenizer( |
|
["Hello world", "Another sentence"], |
|
padding=True, |
|
truncation=True, |
|
return_tensors="np" |
|
) |
|
``` |
|
|
|
### 4. Run Inference |
|
|
|
```python |
|
outputs = session.run( |
|
["token_embeddings", "sentence_embedding"], |
|
{ |
|
"input_ids": inputs["input_ids"], |
|
"attention_mask": inputs["attention_mask"] |
|
} |
|
) |
|
|
|
token_embeddings, sentence_embeddings = outputs |
|
``` |
|
|
|
## Serving with Triton |
|
|
|
Place your model files under: |
|
|
|
models/ |
|
└── nomic_embeddings/ |
|
└── 1/ |
|
├── model.onnx |
|
├── config.pbtxt |
|
└── (tokenizer files…) |
|
|
|
|
|
Create a config.pbtxt file that looks something like this: |
|
|
|
```protobuf |
|
name: "nomic_embeddings" |
|
backend: "onnxruntime" |
|
max_batch_size: 8 |
|
|
|
input [ |
|
{ |
|
name: "input_ids" |
|
data_type: TYPE_INT32 |
|
dims: [-1] |
|
}, |
|
{ |
|
name: "attention_mask" |
|
data_type: TYPE_INT32 |
|
dims: [-1] |
|
} |
|
] |
|
|
|
output [ |
|
{ |
|
name: "token_embeddings" |
|
data_type: TYPE_FP32 |
|
dims: [-1, 768] |
|
}, |
|
{ |
|
name: "sentence_embedding" |
|
data_type: TYPE_FP32 |
|
dims: [-1, 768] |
|
} |
|
] |
|
|
|
instance_group [ |
|
{ |
|
kind: KIND_GPU |
|
count: 1 |
|
} |
|
] |
|
``` |
|
|
|
Start Triton: |
|
|
|
```bash |
|
tritonserver \ |
|
--model-repository=/path/to/models \ |
|
--model-control-mode=explicit \ |
|
--load-model=nomic_embeddings |
|
``` |