Update README.md
Browse files
README.md
CHANGED
|
@@ -13,11 +13,6 @@ tags:
|
|
| 13 |
---
|
| 14 |
|
| 15 |
# opensearch-neural-sparse-encoding-v1
|
| 16 |
-
This is a learned sparse retrieval model. It encodes the queries and documents to 30522 dimensional **sparse vectors**. The non-zero dimension index means the corresponding token in the vocabulary, and the weight means the importance of the token.
|
| 17 |
-
|
| 18 |
-
This model is trained on MS MARCO dataset.
|
| 19 |
-
|
| 20 |
-
OpenSearch neural sparse feature supports learned sparse retrieval with lucene inverted index. Link: https://opensearch.org/docs/latest/query-dsl/specialized/neural-sparse/. The indexing and search can be performed with OpenSearch high-level API.
|
| 21 |
|
| 22 |
## Select the model
|
| 23 |
The model should be selected considering search relevance, model inference and retrieval efficiency(FLOPS). We benchmark models' **zero-shot performance** on a subset of BEIR benchmark: TrecCovid,NFCorpus,NQ,HotpotQA,FiQA,ArguAna,Touche,DBPedia,SCIDOCS,FEVER,Climate FEVER,SciFact,Quora.
|
|
@@ -32,6 +27,13 @@ Overall, the v2 series of models have better search relevance, efficiency and in
|
|
| 32 |
| [opensearch-neural-sparse-encoding-doc-v2-distill](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill) | ✔️ | 67M | 0.504 | 1.8 |
|
| 33 |
| [opensearch-neural-sparse-encoding-doc-v2-mini](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini) | ✔️ | 23M | 0.497 | 1.7 |
|
| 34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
## Usage (HuggingFace)
|
| 36 |
This model is supposed to run inside OpenSearch cluster. But you can also use it outside the cluster, with HuggingFace models API.
|
| 37 |
|
|
|
|
| 13 |
---
|
| 14 |
|
| 15 |
# opensearch-neural-sparse-encoding-v1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
## Select the model
|
| 18 |
The model should be selected considering search relevance, model inference and retrieval efficiency(FLOPS). We benchmark models' **zero-shot performance** on a subset of BEIR benchmark: TrecCovid,NFCorpus,NQ,HotpotQA,FiQA,ArguAna,Touche,DBPedia,SCIDOCS,FEVER,Climate FEVER,SciFact,Quora.
|
|
|
|
| 27 |
| [opensearch-neural-sparse-encoding-doc-v2-distill](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill) | ✔️ | 67M | 0.504 | 1.8 |
|
| 28 |
| [opensearch-neural-sparse-encoding-doc-v2-mini](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini) | ✔️ | 23M | 0.497 | 1.7 |
|
| 29 |
|
| 30 |
+
## Overview
|
| 31 |
+
This is a learned sparse retrieval model. It encodes the queries and documents to 30522 dimensional **sparse vectors**. The non-zero dimension index means the corresponding token in the vocabulary, and the weight means the importance of the token.
|
| 32 |
+
|
| 33 |
+
This model is trained on MS MARCO dataset.
|
| 34 |
+
|
| 35 |
+
OpenSearch neural sparse feature supports learned sparse retrieval with lucene inverted index. Link: https://opensearch.org/docs/latest/query-dsl/specialized/neural-sparse/. The indexing and search can be performed with OpenSearch high-level API.
|
| 36 |
+
|
| 37 |
## Usage (HuggingFace)
|
| 38 |
This model is supposed to run inside OpenSearch cluster. But you can also use it outside the cluster, with HuggingFace models API.
|
| 39 |
|