Instructions to use topk-io/Iso-ModernColBERT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use topk-io/Iso-ModernColBERT with sentence-transformers:
from pylate import models queries = [ "Which planet is known as the Red Planet?", "What is the largest planet in our solar system?", ] documents = [ ["Mars is the Red Planet.", "Venus is Earth's twin."], ["Jupiter is the largest planet.", "Saturn has rings."], ] model = models.ColBERT(model_name_or_path="topk-io/Iso-ModernColBERT") queries_emb = model.encode(queries, is_query=True) docs_emb = model.encode(documents, is_query=False) - Notebooks
- Google Colab
- Kaggle
Looking for production ready multi-vector search? Check out TopK, hybrid retrieval engine build on object storage.
Iso-ModernColBERT
This model is an isotropically corrected version of GTE-ModernColBERT-v1.
It's built for production use cases where retrieval speed and quality matter. Compared to the original model, this version delivers
up to 3x faster inference in bf16 with almost no loss in accuracy and enables scalable multi-vector retrieval through
Sparse Multi-Vector Encoding (SMVE) inside TopK.
Usage
Install PyLate for embeddings and TopK SDK for retrieval.
pip install -U pylate topk-sdk
Embed documents
First, load the model into PyLate ColBERT class and encode your documents.
import torch
import numpy as np
from pylate import models
model = models.ColBERT(
model_name_or_path="topk-io/Iso-ModernColBERT",
model_kwargs={'torch_dtype': torch.bfloat16},
)
documents = [
"document 1 text",
"document 2 text",
"document 3 text",
]
doc_embeddings = model.encode(
documents,
batch_size=32,
# Ensure that it is set to False to indicate that these are documents, not queries
is_query=False,
show_progress_bar=True,
)
Store document embeddings
Index multi-vector document embeddings inside TopK, hybrid retrieval engine built on object storage. To get started, create an API key.
from topk_sdk import Client
from topk_sdk.schema import matrix, multi_vector_index
# Initialize TopK client
client = Client(
api_key = "<TOPK_API_KEY>",
region = "aws-us-east-1-elastica",
)
# Create a collection with multi-vector index
client.collections().create(
"iso-moderncolbert",
schema = {
"token_embeddings": matrix(dimension=128, value_type="f16")
.index(multi_vector_index(metric="maxsim"))
}
)
# Upsert document embeddings
client.collection("iso-moderncolbert").upsert([
{
"_id": str(i),
"token_embeddings": emb.astype(np.float16),
"text": text
}
for (i, (text, emb)) in enumerate(zip(documents, doc_embeddings))
])
Retrieve documents for queries
Your documents are now durably persisted in the index and queryable.
from topk_sdk.query import fn, select, field
# Encode query string
query_embedding = model.encode(
"query for document 3",
# Ensure that it is set to True for queries
is_query=True,
show_progress_bar=False,
)
# Retrieve top-k documents using the query embedding
results = client.collection("iso-moderncolbert").query(
select(
"_id", "text",
# Compute maxsim between query and indexed documents
maxsim_score = fn.multi_vector_distance(
"token_embeddings",
query_embedding.astype(np.float16)
)
)
# Get the top 10 matching documents
.topk(field("maxsim_score"), 10)
)
for r in results:
print(f"id: {r['_id']}, score: {r['maxsim_score']}, text: {r['text']}")
TopK's query language is flexible and allows you to tune retrieval parameters, combine multi-vector with metadata filters, keyword search, and more. Check out our docs to learn more.
Evaluation results
We conducted evaluation of our model using an internal evaluation harness on two standard benchmarks - BEIR and NanoBEIR.
For baselines, we selected GTE-ModernColBERT-v1 and evaluated its perfomance in fp32 and bf16 precision (denoted by GTE fp32 and GTE bf16, respectively).
The last two columns of each table — Iso bf16 and Δ vs GTE — describe Iso-ModernColBERT (ours) in bf16 precision.
In all configurations we used the same SMVE implementation with width 65536 and k=32.
BEIR
NDCG@10 — ranking quality is robust to bf16
End-to-end ranking quality reported as NDCG@10, using exact MaxSim scoring (no approximation). GTE-ModernColBERT-v1 loses ~7 NDCG points on average going from fp32 → bf16 — about a 13% relative drop — with the worst-hit datasets (trec-covid, climate-fever, hotpotqa) dropping 12–16 points. Iso-ModernColBERT keeps fp32-level ranking quality in bf16, recovering most of that gap on average and on every dataset.
| dataset | GTE fp32 N@10 | GTE bf16 N@10 | Iso bf16 N@10 | Δ vs GTE bf16 |
|---|---|---|---|---|
| arguana | 35.81% | 30.35% | 34.63% | +14.10% |
| climate-fever | 32.44% | 19.49% | 31.62% | +62.24% |
| cqadupstack | 40.54% | 38.25% | 40.64% | +6.25% |
| dbpedia | 53.96% | 48.43% | 52.84% | +9.11% |
| fever | 88.80% | 80.67% | 87.08% | +7.95% |
| fiqa | 45.56% | 37.15% | 43.48% | +17.04% |
| hotpotqa | 78.36% | 66.74% | 75.85% | +13.65% |
| msmarco | 46.12% | 41.82% | 45.30% | +8.32% |
| nfcorpus | 37.81% | 35.98% | 37.31% | +3.70% |
| nq | 62.24% | 52.60% | 60.45% | +14.92% |
| quora | 86.63% | 79.58% | 85.05% | +6.87% |
| scidocs | 19.49% | 17.82% | 18.81% | +5.56% |
| scifact | 75.98% | 71.55% | 75.26% | +5.18% |
| touche2020 | 31.30% | 22.93% | 29.45% | +28.43% |
| trec-covid | 89.30% | 73.47% | 83.76% | +14.01% |
| avg | 54.96% | 47.79% | 53.44% | +11.82% |
Recall@100 — SMVE as a first stage with ~10× overfetch
The following results show model performance when used with Sparse Multi-Vector Encoder (SMVE) as a first stage retriever.
For a SMVE first stage to be usable, it needs to surface the candidates that the exact fp32 MaxSim model would have ranked at the top. SMVE on GTE-ModernColBERT-v1 is broken — its compacted latent geometry means random anchors don't separate vectors well. Iso-ModernColBERT's SMVE recovers (and often exceeds) the fp32 MaxSim top-10 within 10× overfetch.
| dataset | GTE fp32 MaxSim R@10 | GTE fp32 SMVE R@100 | Iso bf16 SMVE R@100 | Δ vs GTE fp32 SMVE |
|---|---|---|---|---|
| arguana | 72.81% | 27.69% | 84.51% | +205.20% |
| climate-fever | 39.27% | 0.41% | 48.84% | +11,812% ⚠ |
| cqadupstack | 50.48% | 11.78% | 37.29% | +216.55% |
| dbpedia | 30.45% | 8.54% | 36.89% | +331.97% |
| fever | 94.20% | 10.05% | 94.31% | +838.41% |
| fiqa | 52.15% | 6.45% | 49.12% | +661.55% |
| hotpotqa | 80.73% | 12.29% | 66.59% | +441.82% |
| msmarco | 68.64% | 27.77% | 75.83% | +173.07% |
| nfcorpus | 18.03% | 16.63% | 25.60% | +53.94% |
| nq | 82.03% | 14.60% | 78.85% | +440.07% |
| quora | 94.92% | 43.73% | 82.86% | +89.48% |
| scidocs | 20.36% | 12.29% | 29.32% | +138.57% |
| scifact | 87.39% | 60.93% | 90.00% | +47.71% |
| touche2020 | 19.69% | 4.47% | 40.17% | +798.66% |
| trec-covid | 2.27% | 0.89% | 7.73% | +768.54% |
| avg | 54.23% | 17.23% | 56.53% | +228.09% |
⚠ The +11,812% on climate-fever is an artifact of a near-zero baseline (0.41%): GTE's SMVE is so broken on that dataset that the ratio explodes. Read it as "GTE SMVE doesn't work here at all", not as a meaningful magnitude.
Recall@1000 — SMVE as a first stage with ~10× overfetch (deeper pool)
Same picture at the next pool depth: Iso-ModernColBERT SMVE R@1000 essentially matches or exceeds fp32 MaxSim R@100 across the board, while GTE's SMVE collapses.
| dataset | GTE fp32 MaxSim R@100 | GTE fp32 SMVE R@1000 | Iso bf16 SMVE R@1000 | Δ vs GTE fp32 SMVE |
|---|---|---|---|---|
| arguana | 95.72% | 68.31% | 97.00% | +42.00% |
| climate-fever | 66.45% | 0.93% | 68.87% | +7,305% ⚠ |
| cqadupstack | 71.44% | 26.78% | 55.78% | +108.29% |
| dbpedia | 62.50% | 18.35% | 57.72% | +214.55% |
| fever | 97.46% | 16.74% | 96.91% | +478.91% |
| fiqa | 75.64% | 21.09% | 76.70% | +263.68% |
| hotpotqa | 90.31% | 22.72% | 78.83% | +247.05% |
| msmarco | 93.14% | 46.57% | 90.97% | +95.34% |
| nfcorpus | 32.22% | 49.11% | 57.16% | +16.39% |
| nq | 96.59% | 29.88% | 91.42% | +205.96% |
| quora | 99.45% | 69.38% | 94.86% | +36.72% |
| scidocs | 44.07% | 32.62% | 53.43% | +63.80% |
| scifact | 96.00% | 89.82% | 99.33% | +10.59% |
| touche2020 | 52.60% | 13.91% | 69.63% | +400.58% |
| trec-covid | 16.02% | 3.85% | 29.57% | +668.05% |
| avg | 72.64% | 34.00% | 74.55% | +119.26% |
⚠ Again, climate-fever's +7,305% is driven by a near-zero baseline (0.93%) — GTE SMVE simply doesn't work on this dataset.
NanoBEIR
NDCG@10 — ranking quality is robust to bf16
End-to-end ranking quality reported as NDCG@10, using exact MaxSim scoring (no approximation). GTE-ModernColBERT-v1 drops ~6 NDCG points on average going from fp32 → bf16 — about a 9% relative drop — with some datasets (ArguAna, ClimateFEVER, FiQA, Touche2020) losing 8–13 points. Iso-ModernColBERT keeps fp32-level ranking quality in bf16 — average is within 0.6 points of fp32, and most per-dataset gaps close to a few percent.
| dataset | GTE fp32 N@10 | GTE bf16 N@10 | Iso bf16 N@10 | Δ vs GTE bf16 |
|---|---|---|---|---|
| ArguAna | 51.98% | 43.50% | 54.31% | +24.85% |
| ClimateFEVER | 40.46% | 27.78% | 38.17% | +37.40% |
| DBPedia | 72.82% | 70.39% | 71.56% | +1.66% |
| FEVER | 94.52% | 89.82% | 93.23% | +3.80% |
| FiQA2018 | 56.64% | 44.13% | 55.79% | +26.42% |
| HotpotQA | 89.95% | 85.64% | 90.47% | +5.64% |
| MSMARCO | 70.89% | 68.77% | 72.56% | +5.51% |
| NFCorpus | 39.58% | 39.20% | 38.67% | -1.35% |
| NQ | 77.19% | 69.01% | 73.64% | +6.71% |
| QuoraRetrieval | 97.08% | 90.60% | 96.53% | +6.54% |
| SCIDOCS | 39.85% | 38.02% | 38.14% | +0.32% |
| SciFact | 82.98% | 80.45% | 83.32% | +3.57% |
| Touche2020 | 59.34% | 48.67% | 58.77% | +20.75% |
| avg | 67.18% | 61.23% | 66.55% | +8.69% |
Recall@100 — SMVE as a first stage with ~10× overfetch
The following results show model performance when used with Sparse Multi-Vector Encoder (SMVE) as a first stage retriever.
For a SMVE first stage to be usable, it needs to surface the candidates that the exact fp32 MaxSim model would have ranked at the top. SMVE on GTE-ModernColBERT-v1 is broken — its compacted latent geometry means random anchors don't separate vectors well. Iso-ModernColBERT's SMVE recovers (and often exceeds) fp32 MaxSim's top-10 within 10× overfetch.
| dataset | GTE fp32 MaxSim R@10 | GTE fp32 SMVE R@100 | Iso bf16 SMVE R@100 | Δ vs GTE fp32 SMVE |
|---|---|---|---|---|
| ArguAna | 80.00% | 32.00% | 90.00% | +181.25% |
| ClimateFEVER | 47.07% | 20.67% | 66.97% | +224.00% |
| DBPedia | 41.21% | 49.00% | 72.85% | +48.67% |
| FEVER | 98.00% | 61.33% | 98.00% | +59.79% |
| FiQA2018 | 64.12% | 23.25% | 78.93% | +239.48% |
| HotpotQA | 92.00% | 46.00% | 90.00% | +95.65% |
| MSMARCO | 92.00% | 84.00% | 98.00% | +16.67% |
| NFCorpus | 15.66% | 16.33% | 24.58% | +50.52% |
| NQ | 88.00% | 70.00% | 95.00% | +35.71% |
| QuoraRetrieval | 98.93% | 87.93% | 96.60% | +9.86% |
| SCIDOCS | 39.67% | 37.87% | 61.17% | +61.53% |
| SciFact | 93.00% | 57.50% | 92.00% | +60.00% |
| Touche2020 | 33.52% | 33.55% | 69.86% | +108.23% |
| avg | 67.94% | 47.65% | 79.53% | +66.91% |
Recall@1000 — SMVE as a first stage with ~10× overfetch (deeper pool)
Same picture at the next pool depth: Iso-ModernColBERT SMVE R@1000 essentially matches or exceeds fp32 MaxSim R@100 across the board, while GTE's SMVE consistently undershoots.
| dataset | GTE fp32 MaxSim R@100 | GTE fp32 SMVE R@1000 | Iso bf16 SMVE R@1000 | Δ vs GTE fp32 SMVE |
|---|---|---|---|---|
| ArguAna | 96.00% | 80.00% | 100.00% | +25.00% |
| ClimateFEVER | 81.17% | 68.80% | 89.03% | +29.40% |
| DBPedia | 85.58% | 84.85% | 96.20% | +13.38% |
| FEVER | 100.00% | 94.33% | 99.00% | +4.95% |
| FiQA2018 | 86.82% | 72.61% | 91.35% | +25.81% |
| HotpotQA | 97.00% | 84.00% | 98.00% | +16.67% |
| MSMARCO | 100.00% | 98.00% | 100.00% | +2.04% |
| NFCorpus | 30.55% | 52.82% | 59.33% | +12.32% |
| NQ | 100.00% | 91.00% | 100.00% | +9.89% |
| QuoraRetrieval | 100.00% | 96.00% | 100.00% | +4.17% |
| SCIDOCS | 70.67% | 78.93% | 90.80% | +15.04% |
| SciFact | 96.00% | 93.00% | 100.00% | +7.53% |
| Touche2020 | 77.23% | 80.46% | 93.09% | +15.70% |
| avg | 86.23% | 82.68% | 93.60% | +13.21% |
- Downloads last month
- 57
Model tree for topk-io/Iso-ModernColBERT
Base model
answerdotai/ModernBERT-base