File size: 2,742 Bytes
7e8a807
 
 
 
 
 
944c418
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---
license: apache-2.0
base_model:
- nomic-ai/nomic-embed-text-v1.5
pipeline_tag: sentence-similarity
---
# Nomic Embed Text V1 (ONNX)

**Tags:** `text-embedding` `onnx` `nomic-embed-text` `sentence-transformers`

---

## Model Details

- **Model Name:** Nomic Embed Text V1 (ONNX export)  
- **Original HF Repo:** [nomic-ai/nomic-embed-text-v1](https://huggingface.co/nomic-ai/nomic-embed-text-v1)  
- **ONNX File:** `model.onnx`  
- **Export Date:** 2025-05-27  

This model outputs:
1. **token_embeddings** — per‐token embedding vectors (`[batch_size, seq_len, hidden_size]`)  
2. **sentence_embedding** — pooled sentence‐level embeddings (`[batch_size, hidden_size]`)  

---

## Model Description

Nomic Embed Text V1 is a BERT‐style encoder trained to generate high-quality dense representations of text. It is suitable for:

- Semantic search  
- Text clustering  
- Recommendation systems  
- Downstream classification  

The ONNX export ensures compatibility with inference engines like [ONNX Runtime](https://www.onnxruntime.ai/) and NVIDIA Triton Inference Server.

---

## Usage

### 1. Install Dependencies

```bash
pip install onnxruntime transformers numpy
```

### 2. Install Dependencies

```python
import onnxruntime as ort

session = ort.InferenceSession("model.onnx")
```

### 3. Tokenize Inputs

```python
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("nomic-ai/nomic-embed-text-v1")
inputs = tokenizer(
    ["Hello world", "Another sentence"],
    padding=True,
    truncation=True,
    return_tensors="np"
)
```

### 4. Run Inference

```python
outputs = session.run(
    ["token_embeddings", "sentence_embedding"],
    {
        "input_ids": inputs["input_ids"],
        "attention_mask": inputs["attention_mask"]
    }
)

token_embeddings, sentence_embeddings = outputs
```

## Serving with Triton

Place your model files under:

models/
└── nomic_embeddings/
    └── 1/
        ├── model.onnx
        ├── config.pbtxt
        └── (tokenizer files…)


Create a config.pbtxt file that looks something like this:

```protobuf
name: "nomic_embeddings"
backend: "onnxruntime"
max_batch_size: 8

input [
  {
    name: "input_ids"
    data_type: TYPE_INT32
    dims: [-1]
  },
  {
    name: "attention_mask"
    data_type: TYPE_INT32
    dims: [-1]
  }
]

output [
  {
    name: "token_embeddings"
    data_type: TYPE_FP32
    dims: [-1, 768]
  },
  {
    name: "sentence_embedding"
    data_type: TYPE_FP32
    dims: [-1, 768]
  }
]

instance_group [
  {
    kind: KIND_GPU
    count: 1
  }
]
```

Start Triton:

```bash
tritonserver \
  --model-repository=/path/to/models \
  --model-control-mode=explicit \
  --load-model=nomic_embeddings
```