Add ONNX export from Sentence Transformers

#27
by alvarobartt HF Staff - opened

Hello!

This pull request has been automatically generated from the push_to_hub method from the Sentence Transformers library.

Full Model Architecture:

SentenceTransformer(
  (0): Transformer({'max_seq_length': 32768, 'do_lower_case': False}) with Transformer model: ORTModelForFeatureExtraction 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
  (2): Normalize()
)

Tip:

Consider testing this pull request before merging by loading the model from this PR with the revision argument:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "Qwen/Qwen3-Embedding-0.6B",
    revision="refs/pr/27",
    backend="onnx",
)

# Verify that everything works as expected
embeddings = model.encode(["The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium."])
print(embeddings.shape)

similarities = model.similarity(embeddings, embeddings)
print(similarities)
alvarobartt changed pull request title from Add new SentenceTransformer model with an onnx backend to Add ONNX export from Sentence Transformers

Run it with Text Embeddings Inference (TEI) on CPU with the ONNX backend as it follows:

docker run -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.7 --model-id Qwen/Qwen3-Embedding-0.6B --revision refs/pr/27
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment