Looking for production ready multi-vector search? Check out TopK, hybrid retrieval engine build on object storage.

Iso-ModernColBERT

This model is an isotropically corrected version of GTE-ModernColBERT-v1. It's built for production use cases where retrieval speed and quality matter. Compared to the original model, this version delivers up to 3x faster inference in bf16 with almost no loss in accuracy and enables scalable multi-vector retrieval through Sparse Multi-Vector Encoding (SMVE) inside TopK.

Usage

Install PyLate for embeddings and TopK SDK for retrieval.

pip install -U pylate topk-sdk

Embed documents

First, load the model into PyLate ColBERT class and encode your documents.

import torch
import numpy as np
from pylate import models


model = models.ColBERT(
  model_name_or_path="topk-io/Iso-ModernColBERT",
  model_kwargs={'torch_dtype': torch.bfloat16},
)

documents = [
  "document 1 text",
  "document 2 text",
  "document 3 text",
]

doc_embeddings = model.encode(
    documents,
    batch_size=32,
    # Ensure that it is set to False to indicate that these are documents, not queries
    is_query=False,
    show_progress_bar=True,
)

Store document embeddings

Index multi-vector document embeddings inside TopK, hybrid retrieval engine built on object storage. To get started, create an API key.

from topk_sdk import Client
from topk_sdk.schema import matrix, multi_vector_index

# Initialize TopK client
client = Client(
    api_key = "<TOPK_API_KEY>",
    region = "aws-us-east-1-elastica",
)

# Create a collection with multi-vector index
client.collections().create(
    "iso-moderncolbert",
    schema = {
        "token_embeddings": matrix(dimension=128, value_type="f16")
            .index(multi_vector_index(metric="maxsim"))
    }
)

# Upsert document embeddings
client.collection("iso-moderncolbert").upsert([
  {
    "_id": str(i),
    "token_embeddings": emb.astype(np.float16),
    "text": text
  }
  for (i, (text, emb)) in enumerate(zip(documents, doc_embeddings))
])

Retrieve documents for queries

Your documents are now durably persisted in the index and queryable.

from topk_sdk.query import fn, select, field

# Encode query string
query_embedding = model.encode(
    "query for document 3",
    # Ensure that it is set to True for queries
    is_query=True,
    show_progress_bar=False,
)

# Retrieve top-k documents using the query embedding
results = client.collection("iso-moderncolbert").query(
    select(
        "_id", "text",
        # Compute maxsim between query and indexed documents
        maxsim_score = fn.multi_vector_distance(
            "token_embeddings",
            query_embedding.astype(np.float16)
        )
    )
    # Get the top 10 matching documents
    .topk(field("maxsim_score"), 10)
)

for r in results:
    print(f"id: {r['_id']}, score: {r['maxsim_score']}, text: {r['text']}")

TopK's query language is flexible and allows you to tune retrieval parameters, combine multi-vector with metadata filters, keyword search, and more. Check out our docs to learn more.

Evaluation results

We conducted evaluation of our model using an internal evaluation harness on two standard benchmarks - BEIR and NanoBEIR. For baselines, we selected GTE-ModernColBERT-v1 and evaluated its perfomance in fp32 and bf16 precision (denoted by GTE fp32 and GTE bf16, respectively). The last two columns of each table — Iso bf16 and Δ vs GTE — describe Iso-ModernColBERT (ours) in bf16 precision. In all configurations we used the same SMVE implementation with width 65536 and k=32.

BEIR

NDCG@10 — ranking quality is robust to bf16

End-to-end ranking quality reported as NDCG@10, using exact MaxSim scoring (no approximation). GTE-ModernColBERT-v1 loses ~7 NDCG points on average going from fp32 → bf16 — about a 13% relative drop — with the worst-hit datasets (trec-covid, climate-fever, hotpotqa) dropping 12–16 points. Iso-ModernColBERT keeps fp32-level ranking quality in bf16, recovering most of that gap on average and on every dataset.

dataset GTE fp32 N@10 GTE bf16 N@10 Iso bf16 N@10 Δ vs GTE bf16
arguana 35.81% 30.35% 34.63% +14.10%
climate-fever 32.44% 19.49% 31.62% +62.24%
cqadupstack 40.54% 38.25% 40.64% +6.25%
dbpedia 53.96% 48.43% 52.84% +9.11%
fever 88.80% 80.67% 87.08% +7.95%
fiqa 45.56% 37.15% 43.48% +17.04%
hotpotqa 78.36% 66.74% 75.85% +13.65%
msmarco 46.12% 41.82% 45.30% +8.32%
nfcorpus 37.81% 35.98% 37.31% +3.70%
nq 62.24% 52.60% 60.45% +14.92%
quora 86.63% 79.58% 85.05% +6.87%
scidocs 19.49% 17.82% 18.81% +5.56%
scifact 75.98% 71.55% 75.26% +5.18%
touche2020 31.30% 22.93% 29.45% +28.43%
trec-covid 89.30% 73.47% 83.76% +14.01%
avg 54.96% 47.79% 53.44% +11.82%

Recall@100 — SMVE as a first stage with ~10× overfetch

The following results show model performance when used with Sparse Multi-Vector Encoder (SMVE) as a first stage retriever.

For a SMVE first stage to be usable, it needs to surface the candidates that the exact fp32 MaxSim model would have ranked at the top. SMVE on GTE-ModernColBERT-v1 is broken — its compacted latent geometry means random anchors don't separate vectors well. Iso-ModernColBERT's SMVE recovers (and often exceeds) the fp32 MaxSim top-10 within 10× overfetch.

dataset GTE fp32 MaxSim R@10 GTE fp32 SMVE R@100 Iso bf16 SMVE R@100 Δ vs GTE fp32 SMVE
arguana 72.81% 27.69% 84.51% +205.20%
climate-fever 39.27% 0.41% 48.84% +11,812%
cqadupstack 50.48% 11.78% 37.29% +216.55%
dbpedia 30.45% 8.54% 36.89% +331.97%
fever 94.20% 10.05% 94.31% +838.41%
fiqa 52.15% 6.45% 49.12% +661.55%
hotpotqa 80.73% 12.29% 66.59% +441.82%
msmarco 68.64% 27.77% 75.83% +173.07%
nfcorpus 18.03% 16.63% 25.60% +53.94%
nq 82.03% 14.60% 78.85% +440.07%
quora 94.92% 43.73% 82.86% +89.48%
scidocs 20.36% 12.29% 29.32% +138.57%
scifact 87.39% 60.93% 90.00% +47.71%
touche2020 19.69% 4.47% 40.17% +798.66%
trec-covid 2.27% 0.89% 7.73% +768.54%
avg 54.23% 17.23% 56.53% +228.09%

⚠ The +11,812% on climate-fever is an artifact of a near-zero baseline (0.41%): GTE's SMVE is so broken on that dataset that the ratio explodes. Read it as "GTE SMVE doesn't work here at all", not as a meaningful magnitude.

Recall@1000 — SMVE as a first stage with ~10× overfetch (deeper pool)

Same picture at the next pool depth: Iso-ModernColBERT SMVE R@1000 essentially matches or exceeds fp32 MaxSim R@100 across the board, while GTE's SMVE collapses.

dataset GTE fp32 MaxSim R@100 GTE fp32 SMVE R@1000 Iso bf16 SMVE R@1000 Δ vs GTE fp32 SMVE
arguana 95.72% 68.31% 97.00% +42.00%
climate-fever 66.45% 0.93% 68.87% +7,305%
cqadupstack 71.44% 26.78% 55.78% +108.29%
dbpedia 62.50% 18.35% 57.72% +214.55%
fever 97.46% 16.74% 96.91% +478.91%
fiqa 75.64% 21.09% 76.70% +263.68%
hotpotqa 90.31% 22.72% 78.83% +247.05%
msmarco 93.14% 46.57% 90.97% +95.34%
nfcorpus 32.22% 49.11% 57.16% +16.39%
nq 96.59% 29.88% 91.42% +205.96%
quora 99.45% 69.38% 94.86% +36.72%
scidocs 44.07% 32.62% 53.43% +63.80%
scifact 96.00% 89.82% 99.33% +10.59%
touche2020 52.60% 13.91% 69.63% +400.58%
trec-covid 16.02% 3.85% 29.57% +668.05%
avg 72.64% 34.00% 74.55% +119.26%

⚠ Again, climate-fever's +7,305% is driven by a near-zero baseline (0.93%) — GTE SMVE simply doesn't work on this dataset.

NanoBEIR

NDCG@10 — ranking quality is robust to bf16

End-to-end ranking quality reported as NDCG@10, using exact MaxSim scoring (no approximation). GTE-ModernColBERT-v1 drops ~6 NDCG points on average going from fp32 → bf16 — about a 9% relative drop — with some datasets (ArguAna, ClimateFEVER, FiQA, Touche2020) losing 8–13 points. Iso-ModernColBERT keeps fp32-level ranking quality in bf16 — average is within 0.6 points of fp32, and most per-dataset gaps close to a few percent.

dataset GTE fp32 N@10 GTE bf16 N@10 Iso bf16 N@10 Δ vs GTE bf16
ArguAna 51.98% 43.50% 54.31% +24.85%
ClimateFEVER 40.46% 27.78% 38.17% +37.40%
DBPedia 72.82% 70.39% 71.56% +1.66%
FEVER 94.52% 89.82% 93.23% +3.80%
FiQA2018 56.64% 44.13% 55.79% +26.42%
HotpotQA 89.95% 85.64% 90.47% +5.64%
MSMARCO 70.89% 68.77% 72.56% +5.51%
NFCorpus 39.58% 39.20% 38.67% -1.35%
NQ 77.19% 69.01% 73.64% +6.71%
QuoraRetrieval 97.08% 90.60% 96.53% +6.54%
SCIDOCS 39.85% 38.02% 38.14% +0.32%
SciFact 82.98% 80.45% 83.32% +3.57%
Touche2020 59.34% 48.67% 58.77% +20.75%
avg 67.18% 61.23% 66.55% +8.69%

Recall@100 — SMVE as a first stage with ~10× overfetch

The following results show model performance when used with Sparse Multi-Vector Encoder (SMVE) as a first stage retriever.

For a SMVE first stage to be usable, it needs to surface the candidates that the exact fp32 MaxSim model would have ranked at the top. SMVE on GTE-ModernColBERT-v1 is broken — its compacted latent geometry means random anchors don't separate vectors well. Iso-ModernColBERT's SMVE recovers (and often exceeds) fp32 MaxSim's top-10 within 10× overfetch.

dataset GTE fp32 MaxSim R@10 GTE fp32 SMVE R@100 Iso bf16 SMVE R@100 Δ vs GTE fp32 SMVE
ArguAna 80.00% 32.00% 90.00% +181.25%
ClimateFEVER 47.07% 20.67% 66.97% +224.00%
DBPedia 41.21% 49.00% 72.85% +48.67%
FEVER 98.00% 61.33% 98.00% +59.79%
FiQA2018 64.12% 23.25% 78.93% +239.48%
HotpotQA 92.00% 46.00% 90.00% +95.65%
MSMARCO 92.00% 84.00% 98.00% +16.67%
NFCorpus 15.66% 16.33% 24.58% +50.52%
NQ 88.00% 70.00% 95.00% +35.71%
QuoraRetrieval 98.93% 87.93% 96.60% +9.86%
SCIDOCS 39.67% 37.87% 61.17% +61.53%
SciFact 93.00% 57.50% 92.00% +60.00%
Touche2020 33.52% 33.55% 69.86% +108.23%
avg 67.94% 47.65% 79.53% +66.91%

Recall@1000 — SMVE as a first stage with ~10× overfetch (deeper pool)

Same picture at the next pool depth: Iso-ModernColBERT SMVE R@1000 essentially matches or exceeds fp32 MaxSim R@100 across the board, while GTE's SMVE consistently undershoots.

dataset GTE fp32 MaxSim R@100 GTE fp32 SMVE R@1000 Iso bf16 SMVE R@1000 Δ vs GTE fp32 SMVE
ArguAna 96.00% 80.00% 100.00% +25.00%
ClimateFEVER 81.17% 68.80% 89.03% +29.40%
DBPedia 85.58% 84.85% 96.20% +13.38%
FEVER 100.00% 94.33% 99.00% +4.95%
FiQA2018 86.82% 72.61% 91.35% +25.81%
HotpotQA 97.00% 84.00% 98.00% +16.67%
MSMARCO 100.00% 98.00% 100.00% +2.04%
NFCorpus 30.55% 52.82% 59.33% +12.32%
NQ 100.00% 91.00% 100.00% +9.89%
QuoraRetrieval 100.00% 96.00% 100.00% +4.17%
SCIDOCS 70.67% 78.93% 90.80% +15.04%
SciFact 96.00% 93.00% 100.00% +7.53%
Touche2020 77.23% 80.46% 93.09% +15.70%
avg 86.23% 82.68% 93.60% +13.21%
Downloads last month
57
Safetensors
Model size
0.1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for topk-io/Iso-ModernColBERT

Dataset used to train topk-io/Iso-ModernColBERT

Free AI Image Generator No sign-up. Instant results. Open Now