michael-sigamani's picture
Update README.md
7e8a807 verified
---
license: apache-2.0
base_model:
- nomic-ai/nomic-embed-text-v1.5
pipeline_tag: sentence-similarity
---
# Nomic Embed Text V1 (ONNX)
**Tags:** `text-embedding` `onnx` `nomic-embed-text` `sentence-transformers`
---
## Model Details
- **Model Name:** Nomic Embed Text V1 (ONNX export)
- **Original HF Repo:** [nomic-ai/nomic-embed-text-v1](https://huggingface.co/nomic-ai/nomic-embed-text-v1)
- **ONNX File:** `model.onnx`
- **Export Date:** 2025-05-27
This model outputs:
1. **token_embeddings** — per‐token embedding vectors (`[batch_size, seq_len, hidden_size]`)
2. **sentence_embedding** — pooled sentence‐level embeddings (`[batch_size, hidden_size]`)
---
## Model Description
Nomic Embed Text V1 is a BERT‐style encoder trained to generate high-quality dense representations of text. It is suitable for:
- Semantic search
- Text clustering
- Recommendation systems
- Downstream classification
The ONNX export ensures compatibility with inference engines like [ONNX Runtime](https://www.onnxruntime.ai/) and NVIDIA Triton Inference Server.
---
## Usage
### 1. Install Dependencies
```bash
pip install onnxruntime transformers numpy
```
### 2. Install Dependencies
```python
import onnxruntime as ort
session = ort.InferenceSession("model.onnx")
```
### 3. Tokenize Inputs
```python
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("nomic-ai/nomic-embed-text-v1")
inputs = tokenizer(
["Hello world", "Another sentence"],
padding=True,
truncation=True,
return_tensors="np"
)
```
### 4. Run Inference
```python
outputs = session.run(
["token_embeddings", "sentence_embedding"],
{
"input_ids": inputs["input_ids"],
"attention_mask": inputs["attention_mask"]
}
)
token_embeddings, sentence_embeddings = outputs
```
## Serving with Triton
Place your model files under:
models/
└── nomic_embeddings/
└── 1/
├── model.onnx
├── config.pbtxt
└── (tokenizer files…)
Create a config.pbtxt file that looks something like this:
```protobuf
name: "nomic_embeddings"
backend: "onnxruntime"
max_batch_size: 8
input [
{
name: "input_ids"
data_type: TYPE_INT32
dims: [-1]
},
{
name: "attention_mask"
data_type: TYPE_INT32
dims: [-1]
}
]
output [
{
name: "token_embeddings"
data_type: TYPE_FP32
dims: [-1, 768]
},
{
name: "sentence_embedding"
data_type: TYPE_FP32
dims: [-1, 768]
}
]
instance_group [
{
kind: KIND_GPU
count: 1
}
]
```
Start Triton:
```bash
tritonserver \
--model-repository=/path/to/models \
--model-control-mode=explicit \
--load-model=nomic_embeddings
```