File size: 2,621 Bytes
f81bf56 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
---
language:
- en
license: apache-2.0
tags:
- reranker
- cross-encoder
- sequence-classification
- vllm
base_model: Qwen/Qwen3-Reranker-4B
pipeline_tag: text-classification
---
# Qwen3-Reranker-4B-seq-cls-vllm-fixed
This is a fixed version of the Qwen3-Reranker-4B model converted to sequence classification format, optimized for use with vLLM.
## Model Description
This model is a pre-converted version of [Qwen/Qwen3-Reranker-4B](https://huggingface.co/Qwen/Qwen3-Reranker-4B) that:
- Has been converted from CausalLM to SequenceClassification architecture
- Includes proper configuration for vLLM compatibility
- Provides ~75,000x reduction in classification head size
- Offers ~150,000x fewer operations per token compared to using the full LM head
## Key Improvements
The original converted model ([tomaarsen/Qwen3-Reranker-4B-seq-cls](https://huggingface.co/tomaarsen/Qwen3-Reranker-4B-seq-cls)) was missing critical vLLM configuration attributes. This version adds:
```json
{
"classifier_from_token": ["no", "yes"],
"method": "from_2_way_softmax",
"use_pad_token": false,
"is_original_qwen3_reranker": false
}
```
These configurations are essential for vLLM to properly handle the pre-converted weights.
## Usage with vLLM
```bash
vllm serve danielchalef/Qwen3-Reranker-4B-seq-cls-vllm-fixed \
--task score \
--served-model-name qwen3-reranker-4b \
--disable-log-requests
```
### Python Example
```python
from vllm import LLM
llm = LLM(
model="danielchalef/Qwen3-Reranker-4B-seq-cls-vllm-fixed",
task="score"
)
queries = ["What is the capital of France?"]
documents = ["Paris is the capital of France."]
outputs = llm.score(queries, documents)
scores = [output.outputs.score for output in outputs]
print(scores)
```
## Performance
This model performs identically to the original Qwen3-Reranker-4B when used with proper configuration, while providing significant efficiency improvements:
- **Memory**: ~600MB → ~8KB for classification head
- **Compute**: 151,936 logits → 1 logit per forward pass
- **Speed**: Faster inference due to reduced computation
## Technical Details
- **Architecture**: Qwen3ForSequenceClassification
- **Base Model**: Qwen/Qwen3-Reranker-4B
- **Conversion Method**: from_2_way_softmax (yes_logit - no_logit)
- **Model Size**: 4B parameters
- **Task**: Reranking/Scoring
## Citation
If you use this model, please cite the original Qwen3-Reranker:
```bibtex
@misc{qwen3reranker2024,
title={Qwen3-Reranker},
author={Qwen Team},
year={2024},
publisher={Hugging Face}
}
```
## License
Apache 2.0 (inherited from the base model)
|