Add ONNX export from Sentence Transformers
#27
by
alvarobartt
HF Staff
- opened
Hello!
This pull request has been automatically generated from the push_to_hub
method from the Sentence Transformers library.
Full Model Architecture:
SentenceTransformer(
(0): Transformer({'max_seq_length': 32768, 'do_lower_case': False}) with Transformer model: ORTModelForFeatureExtraction
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
(2): Normalize()
)
Tip:
Consider testing this pull request before merging by loading the model from this PR with the revision
argument:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer(
"Qwen/Qwen3-Embedding-0.6B",
revision="refs/pr/27",
backend="onnx",
)
# Verify that everything works as expected
embeddings = model.encode(["The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium."])
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities)
alvarobartt
changed pull request title from
Add new SentenceTransformer model with an onnx backend
to Add ONNX export from Sentence Transformers
Run it with Text Embeddings Inference (TEI) on CPU with the ONNX backend as it follows:
docker run -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.7 --model-id Qwen/Qwen3-Embedding-0.6B --revision refs/pr/27