janni-t's picture
Upload README.md with huggingface_hub
0c2b7f7 verified
metadata
license: apache-2.0
base_model: Qwen/Qwen3-Embedding-0.6B
tags:
  - qwen3
  - text-embeddings-inference
  - onnx
  - sentence-transformers
  - feature-extraction
  - sentence-similarity
language:
  - multilingual
pipeline_tag: sentence-similarity
library_name: sentence-transformers

Qwen3-Embedding-0.6B ONNX for TEI

This is an ONNX version of Qwen/Qwen3-Embedding-0.6B optimized for Text Embeddings Inference (TEI).

Model Details

  • Base Model: Qwen/Qwen3-Embedding-0.6B
  • Format: ONNX with external data (model.onnx + model.onnx_data)
  • Pooling: Mean pooling (built into the ONNX graph)
  • Embedding Dimension: 1024
  • Max Sequence Length: 32768 tokens

Usage with TEI

docker run --gpus all -p 8080:80 -v $PWD:/data \
  ghcr.io/huggingface/text-embeddings-inference:latest \
  --model-id janni-t/qwen3-embedding-0.6b-tei-onnx

For CPU inference:

docker run -p 8080:80 -v $PWD:/data \
  ghcr.io/huggingface/text-embeddings-inference:cpu-latest \
  --model-id janni-t/qwen3-embedding-0.6b-tei-onnx

Conversion Details

This model was converted from the original PyTorch model to ONNX format with:

  • Consolidated external data for TEI compatibility
  • Mean pooling integrated into the ONNX graph
  • Optimized for CPU inference

Original Model

See the original model card at Qwen/Qwen3-Embedding-0.6B for:

  • Model architecture details
  • Training information
  • Benchmark results
  • Citation information

License

Apache 2.0 (same as the original model)