dolfsai
/

Qwen3-Embedding-0.6B-vllm-W8A8

Feature Extraction

8-bit precision

compressed-tensors

Model card Files Files and versions

prudant/Qwen3-Embedding-0.6B-W8A8

This is a compressed version of Qwen/Qwen3-Embedding-0.6B using llm-compressor with the following scheme: W8A8

Important: You MUST read the following guide for correct usage of this model here Guide

Model Details

Original Model: Qwen/Qwen3-Embedding-0.6B
Quantization Method: GPTQ
Compression Libraries: llm-compressor
Calibration Dataset: ultrachat_200k (1024 samples)
Optimized For: Inference with vLLM
License: same as original model

Downloads last month: 8

Safetensors

Model size

751M params

Tensor type

BF16

·

I8

·

Inference Providers NEW

Feature Extraction

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dolfsai/Qwen3-Embedding-0.6B-vllm-W8A8

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-Embedding-0.6B

Quantized

(10)

this model

Dataset used to train dolfsai/Qwen3-Embedding-0.6B-vllm-W8A8