Feature Extraction
Transformers
Chinese
English
medical
code
math
reasoning
general

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Diver-Retriever-4B

HighLights

The Diver Retriever 4B model is a reasoning-intensive model designed to tackle the challenge of reasonIR and rader. We combined data from the fields of mathematics, coding, and healthcare. At the same time, we made precise matching in terms of the difficulty level of the samples, and uniquely constructed negative samples corresponding to each field. Therefore, this model performs very well on the Bright LeaderBoard as well as the Mteb-Medical Benchmark.

Model Description

  • Model type: Text Embedding
  • Language(s) (NLP): Bilingual (Chinese & English)
  • Context Length: 40k
  • Number of Paramaters: 4B

For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our GitHub (https://github.com/AQ-MedAI/Diver).

Evaluation

Method Avg. Bio. Earth. Econ. Psy. Rob. Stack. Sus. Leet. Pony AoPS TheoQ. TheoT.
Evaluate Retriever with Original Query
BM25 14.5 18.9 27.2 14.9 12.5 13.6 18.4 15.0 24.4 7.9 6.2 10.4 4.9
SBERT 14.9 15.1 20.4 16.6 22.7 8.2 11.0 15.3 26.4 7.0 5.3 20.0 10.8
gte-Qwen1.5-7B 22.5 30.6 36.4 17.8 24.6 13.2 22.2 14.8 25.5 9.9 14.4 27.8 32.9
Qwen3-4B 5.6 3.5 8.0 2.3 2.0 1.6 1.0 4.4 2.1 0.1 4.9 18.0 19.2
OpenAI 17.9 23.3 26.7 19.5 27.6 12.8 14.3 20.5 23.6 2.4 8.5 23.5 11.7
Google 20.0 22.7 34.8 19.6 27.8 15.7 20.1 17.1 29.6 3.6 9.3 23.8 15.9
ReasonIR-8B 24.4 26.2 31.4 23.3 30.0 18.0 23.9 20.5 35.0 10.5 14.7 31.9 27.2
RaDeR-7B 25.5 34.6 38.9 22.1 33.0 14.8 22.5 23.7 37.3 5.0 10.2 28.4 35.1
Seed1.5-Embedding 27.2 34.8 46.9 23.4 31.6 19.1 25.4 21.0 43.2 4.9 12.2 33.3 30.5
DIVER-Retriever 28.9 41.8 43.7 21.7 35.3 21.0 21.2 25.1 37.6 13.2 10.7 38.4 37.3
Evaluate Retriever with GPT-4 REASON-query
BM25 27.0 53.6 54.1 24.3 38.7 18.9 27.7 26.3 19.3 17.6 3.9 19.2 20.8
SBERT 17.8 18.5 26.3 17.5 27.2 8.8 11.8 17.5 24.3 10.3 5.0 22.3 23.5
gte-Qwen1.5-7B 24.8 35.5 43.1 24.3 34.3 15.4 22.9 23.9 25.4 5.2 4.6 28.7 34.6
Qwen3-4B 5.5 1.3 17.3 2.5 6.2 1.0 4.8 4.5 3.0 5.9 0.0 7.2 12.5
OpenAI 23.3 35.2 40.1 25.1 38.0 13.6 18.2 24.2 24.5 6.5 7.7 22.9 23.8
Google 26.2 36.4 45.6 25.6 38.2 18.7 29.5 17.9 31.1 3.7 10.0 27.8 30.4
ReasonIR-8B 29.9 43.6 42.9 32.7 38.8 20.9 25.8 27.5 31.5 19.6 7.4 33.1 35.7
RaDeR-7B 29.2 36.1 42.9 25.2 37.9 16.6 27.4 25.0 34.8 11.9 12.0 37.7 43.4
DIVER-Retriever 32.1 51.9 53.5 29.5 41.2 21.4 27.5 26.1 33.5 11.7 9.5 39.3 39.7
Evaluate retriever with DIVER-QExpand query
ReasonIR-8B 32.6 49.4 44.7 32.4 44.0 26.6 31.8 29.0 32.3 12.8 9.1 40.7 38.4
+BM25 (Hybrid) 35.7 56.8 53.5 33.0 48.5 29.4 34.2 32.0 35.2 16.8 12.9 39.3 36.8
DIVER-Retriever 33.9 54.5 52.7 28.8 44.9 25.1 27.4 29.5 34.5 10.0 14.5 40.7 44.7
+BM25 (Hybrid) 37.2 60.0 55.9 31.8 47.9 27.1 33.9 31.9 35.1 23.1 16.8 36.9 46.6

Usage

Inference

Sentence Transformers Usage

# Requires transformers>=4.51.0
# Requires sentence-transformers>=2.7.0

from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer("AQ-MedAI/Diver-Retriever-4B")


# The queries and documents to embed
queries = [
    "What is the capital of China?",
    "Explain gravity",
]
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]

# Encode the queries and documents. Note that queries benefit from using a prompt
# Here we use the prompt called "query" stored under `model.prompts`, but you can
# also pass your own prompt via the `prompt` argument
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

# Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)

Transformers Usage

# Requires transformers>=4.51.0
import torch
import torch.nn.functional as F

from torch import Tensor
from transformers import AutoTokenizer, AutoModel


def last_token_pool(last_hidden_states: Tensor,
                 attention_mask: Tensor) -> Tensor:
    left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
    if left_padding:
        return last_hidden_states[:, -1]
    else:
        sequence_lengths = attention_mask.sum(dim=1) - 1
        batch_size = last_hidden_states.shape[0]
        return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]


def get_detailed_instruct(task_description: str, query: str) -> str:
    return f'Instruct: {task_description}\nQuery:{query}'

# Each query must come with a one-sentence instruction that describes the task
task = 'Given a web search query, retrieve relevant passages that answer the query'

queries = [
    get_detailed_instruct(task, 'What is the capital of China?'),
    get_detailed_instruct(task, 'Explain gravity')
]
# No need to add instruction for retrieval documents
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
]
input_texts = queries + documents

tokenizer = AutoTokenizer.from_pretrained('AQ-MedAI/Diver-Retriever-4B', padding_side='left')
model = AutoModel.from_pretrained('AQ-MedAI/Diver-Retriever-4B')


max_length = 8192

# Tokenize the input texts
batch_dict = tokenizer(
    input_texts,
    padding=True,
    truncation=True,
    max_length=max_length,
    return_tensors="pt",
)
batch_dict.to(model.device)
outputs = model(**batch_dict)
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])

# normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:2] @ embeddings[2:].T)
print(scores.tolist())
# [[0.7534257769584656, 0.1146894246339798], [0.03198453038930893, 0.6258305311203003]]

Finetuning

We recommend you to use swift to finetune our DIVER-Retriever-4B with infonce.

Before starting training, please ensure your environment is properly configured.

pip install ms-swift -U
# Install from source
pip install git+https://github.com/modelscope/ms-swift.git

pip install transformers -U

# Optional packages
pip install deepspeed # multi-GPU training
pip install liger-kernel # save GPU memory resources
pip install flash-attn --no-build-isolation

Training Command

Using infonce loss as an example, the complete training command is as follows:

nproc_per_node=8
NPROC_PER_NODE=$nproc_per_node \
swift sft \
    --model DIVER/DIVER-Retriever-4B \
    --task_type embedding \
    --model_type qwen3_emb \
    --train_type full \
    --dataset your_dataset \
    --split_dataset_ratio 0.05 \
    --eval_strategy steps \
    --output_dir output \
    --eval_steps 20 \
    --num_train_epochs 5 \
    --save_steps 20 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --learning_rate 6e-6 \
    --loss_type infonce \
    --label_names labels \
    --dataloader_drop_last true \
    --deepspeed zero3

Citation

If you find our work helpful, feel free to give us a cite.

@misc{long2025divermultistageapproachreasoningintensive, title={DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval}, author={Meixiu Long and Duolin Sun and Dan Yang and Junjie Wang and Yue Shen and Jian Wang and Peng Wei and Jinjie Gu and Jiahai Wang}, year={2025}, eprint={2508.07995}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2508.07995}, }

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AQ-MedAI/Diver-Retriever-4B

Base model

Qwen/Qwen3-4B-Base
Finetuned
(2)
this model

Datasets used to train AQ-MedAI/Diver-Retriever-4B