File size: 2,742 Bytes
7e8a807 944c418 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
---
license: apache-2.0
base_model:
- nomic-ai/nomic-embed-text-v1.5
pipeline_tag: sentence-similarity
---
# Nomic Embed Text V1 (ONNX)
**Tags:** `text-embedding` `onnx` `nomic-embed-text` `sentence-transformers`
---
## Model Details
- **Model Name:** Nomic Embed Text V1 (ONNX export)
- **Original HF Repo:** [nomic-ai/nomic-embed-text-v1](https://huggingface.co/nomic-ai/nomic-embed-text-v1)
- **ONNX File:** `model.onnx`
- **Export Date:** 2025-05-27
This model outputs:
1. **token_embeddings** — per‐token embedding vectors (`[batch_size, seq_len, hidden_size]`)
2. **sentence_embedding** — pooled sentence‐level embeddings (`[batch_size, hidden_size]`)
---
## Model Description
Nomic Embed Text V1 is a BERT‐style encoder trained to generate high-quality dense representations of text. It is suitable for:
- Semantic search
- Text clustering
- Recommendation systems
- Downstream classification
The ONNX export ensures compatibility with inference engines like [ONNX Runtime](https://www.onnxruntime.ai/) and NVIDIA Triton Inference Server.
---
## Usage
### 1. Install Dependencies
```bash
pip install onnxruntime transformers numpy
```
### 2. Install Dependencies
```python
import onnxruntime as ort
session = ort.InferenceSession("model.onnx")
```
### 3. Tokenize Inputs
```python
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("nomic-ai/nomic-embed-text-v1")
inputs = tokenizer(
["Hello world", "Another sentence"],
padding=True,
truncation=True,
return_tensors="np"
)
```
### 4. Run Inference
```python
outputs = session.run(
["token_embeddings", "sentence_embedding"],
{
"input_ids": inputs["input_ids"],
"attention_mask": inputs["attention_mask"]
}
)
token_embeddings, sentence_embeddings = outputs
```
## Serving with Triton
Place your model files under:
models/
└── nomic_embeddings/
└── 1/
├── model.onnx
├── config.pbtxt
└── (tokenizer files…)
Create a config.pbtxt file that looks something like this:
```protobuf
name: "nomic_embeddings"
backend: "onnxruntime"
max_batch_size: 8
input [
{
name: "input_ids"
data_type: TYPE_INT32
dims: [-1]
},
{
name: "attention_mask"
data_type: TYPE_INT32
dims: [-1]
}
]
output [
{
name: "token_embeddings"
data_type: TYPE_FP32
dims: [-1, 768]
},
{
name: "sentence_embedding"
data_type: TYPE_FP32
dims: [-1, 768]
}
]
instance_group [
{
kind: KIND_GPU
count: 1
}
]
```
Start Triton:
```bash
tritonserver \
--model-repository=/path/to/models \
--model-control-mode=explicit \
--load-model=nomic_embeddings
``` |