File size: 2,803 Bytes
c37cd6e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48fac2a
c37cd6e
 
9d066bb
c37cd6e
 
 
 
8a3b707
811b39e
 
 
 
 
 
 
 
8a3b707
811b39e
 
 
 
8a3b707
c37cd6e
 
 
 
48fac2a
 
 
 
 
 
 
 
c37cd6e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
license: cc-by-nc-4.0
language:
- en
base_model:
- Qwen/Qwen3-4B
pipeline_tag: text-ranking
tags:
- finance
- legal
- code
- stem
- medical
---
# zerank-1: ZeroEntropy Inc.'s SoTA reranker

<!-- Provide a quick summary of what the model is/does. -->

This model is an open-weights reranker model meant to be integrated into RAG applications to rerank results from preliminary search methods such as embeddings, BM25, and hybrid search.

This reranker outperforms other popular rerankers such as cohere-rerank-v3.5 and Salesforce/Llama-rank-v1 across a wide variety of task domains, including on finance, legal, code, STEM, medical, and conversational data. See [this post](https://evals_blog_post) for more details.
This model is trained on an innovative multi-stage pipeline that models query-document relevance scores using adjusted Elo-like ratings. See [this post](https://technical_blog_post) and our Technical Report (Coming soon!) for more details.

For this model's smaller twin, see [zerank-1-small](https://huggingface.co/zeroentropy/zerank-1-small)


## How to Use

```python
from sentence_transformers import CrossEncoder

model = CrossEncoder("zeroentropy/zerank-1", trust_remote_code=True)

query_documents = [
    ("What is 2+2?", "4"),
    ("What is 2+2?", "The answer is definitely 1 million"),
]

scores = model.predict(query_documents)

print(scores)
```

## Evaluations

Comparing NDCG@10 starting from top 100 documents by embedding (using text-3-embedding-small):

| Task           | Embedding | cohere-rerank-v3.5 | Salesforce/Llama-rank-v1 | zerank-1-small | **zerank-1** |
|----------------|-----------|--------------------|--------------------------|----------------|--------------|
| Code           |    0.678  |       0.724        |           0.694          |      0.730     |   **0.754**  |
| Conversational |    0.250  |       0.571        |           0.484          |      0.556     |   **0.596**  |
| Finance        |    0.839  |       0.824        |           0.828          |      0.861     |   **0.894**  |
| Legal          |    0.703  |       0.804        |           0.767          |      0.817     |   **0.821**  |
| Medical        |    0.619  |       0.750        |           0.719          |      0.773     |   **0.796**  |
| STEM           |    0.401  |       0.510        |           0.595          |      0.680     |   **0.694**  |

Comparing BM25 and Hybrid Search without and with zerank-1:

<img src="https://cdn-uploads.huggingface.co/production/uploads/67776f9dcd9c9435499eafc8/2GPVHFrI39FspnSNklhsM.png" alt="Description" width="400"/> <img src="https://cdn-uploads.huggingface.co/production/uploads/67776f9dcd9c9435499eafc8/dwYo2D7hoL8QiE8u3yqr9.png" alt="Description" width="400"/>


## Citation

**BibTeX:**

Coming soon!

**APA:**

Coming soon!