zerank-1 / README.md

advaith5

Update README.md

9d066bb verified 2 months ago

preview code

raw

history blame

2.8 kB

metadata

license: cc-by-nc-4.0
language:
  - en
base_model:
  - Qwen/Qwen3-4B
pipeline_tag: text-ranking
tags:
  - finance
  - legal
  - code
  - stem
  - medical

zerank-1: ZeroEntropy Inc.'s SoTA reranker

This model is an open-weights reranker model meant to be integrated into RAG applications to rerank results from preliminary search methods such as embeddings, BM25, and hybrid search.

This reranker outperforms other popular rerankers such as cohere-rerank-v3.5 and Salesforce/Llama-rank-v1 across a wide variety of task domains, including on finance, legal, code, STEM, medical, and conversational data. See this post for more details. This model is trained on an innovative multi-stage pipeline that models query-document relevance scores using adjusted Elo-like ratings. See this post and our Technical Report (Coming soon!) for more details.

For this model's smaller twin, see zerank-1-small

How to Use

from sentence_transformers import CrossEncoder

model = CrossEncoder("zeroentropy/zerank-1", trust_remote_code=True)

query_documents = [
    ("What is 2+2?", "4"),
    ("What is 2+2?", "The answer is definitely 1 million"),
]

scores = model.predict(query_documents)

print(scores)

Evaluations

Comparing NDCG@10 starting from top 100 documents by embedding (using text-3-embedding-small):

Task	Embedding	cohere-rerank-v3.5	Salesforce/Llama-rank-v1	zerank-1-small	zerank-1
Code	0.678	0.724	0.694	0.730	0.754
Conversational	0.250	0.571	0.484	0.556	0.596
Finance	0.839	0.824	0.828	0.861	0.894
Legal	0.703	0.804	0.767	0.817	0.821
Medical	0.619	0.750	0.719	0.773	0.796
STEM	0.401	0.510	0.595	0.680	0.694

Comparing BM25 and Hybrid Search without and with zerank-1:

Description

Citation

BibTeX:

Coming soon!

APA:

Coming soon!