opensearch-project
/

opensearch-neural-sparse-encoding-v1

Feature Extraction

sentence-transformers

passage-retrieval

query-expansion

document-expansion

text-embeddings-inference

Model card Files Files and versions

zhichao-geng commited on Jul 29, 2024

Commit

69d6657

·

verified ·

1 Parent(s): 5989f9a

Update README.md

Files changed (1) hide show

README.md +7 -5

README.md CHANGED Viewed

@@ -13,11 +13,6 @@ tags:
 ---
 # opensearch-neural-sparse-encoding-v1
-This is a learned sparse retrieval model. It encodes the queries and documents to 30522 dimensional **sparse vectors**. The non-zero dimension index means the corresponding token in the vocabulary, and the weight means the importance of the token.
-This model is trained on MS MARCO dataset.
-OpenSearch neural sparse feature supports learned sparse retrieval with lucene inverted index. Link: https://opensearch.org/docs/latest/query-dsl/specialized/neural-sparse/. The indexing and search can be performed with OpenSearch high-level API.
 ## Select the model
 The model should be selected considering search relevance, model inference and retrieval efficiency(FLOPS). We benchmark models' **zero-shot performance** on a subset of BEIR benchmark: TrecCovid,NFCorpus,NQ,HotpotQA,FiQA,ArguAna,Touche,DBPedia,SCIDOCS,FEVER,Climate FEVER,SciFact,Quora.
@@ -32,6 +27,13 @@ Overall, the v2 series of models have better search relevance, efficiency and in
 | [opensearch-neural-sparse-encoding-doc-v2-distill](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill) | ✔️ | 67M | 0.504 | 1.8 |
 | [opensearch-neural-sparse-encoding-doc-v2-mini](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini) | ✔️ | 23M | 0.497 | 1.7 |
 ## Usage (HuggingFace)
 This model is supposed to run inside OpenSearch cluster. But you can also use it outside the cluster, with HuggingFace models API.

 ---
 # opensearch-neural-sparse-encoding-v1
 ## Select the model
 The model should be selected considering search relevance, model inference and retrieval efficiency(FLOPS). We benchmark models' **zero-shot performance** on a subset of BEIR benchmark: TrecCovid,NFCorpus,NQ,HotpotQA,FiQA,ArguAna,Touche,DBPedia,SCIDOCS,FEVER,Climate FEVER,SciFact,Quora.
 | [opensearch-neural-sparse-encoding-doc-v2-distill](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill) | ✔️ | 67M | 0.504 | 1.8 |
 | [opensearch-neural-sparse-encoding-doc-v2-mini](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini) | ✔️ | 23M | 0.497 | 1.7 |
+## Overview
+This is a learned sparse retrieval model. It encodes the queries and documents to 30522 dimensional **sparse vectors**. The non-zero dimension index means the corresponding token in the vocabulary, and the weight means the importance of the token.
+This model is trained on MS MARCO dataset.
+OpenSearch neural sparse feature supports learned sparse retrieval with lucene inverted index. Link: https://opensearch.org/docs/latest/query-dsl/specialized/neural-sparse/. The indexing and search can be performed with OpenSearch high-level API.
 ## Usage (HuggingFace)
 This model is supposed to run inside OpenSearch cluster. But you can also use it outside the cluster, with HuggingFace models API.