Update README.md
Browse files
README.md
CHANGED
|
@@ -214,10 +214,11 @@ print(model.compute_score(sentence_pairs,
|
|
| 214 |
## Evaluation
|
| 215 |
|
| 216 |
|
| 217 |
-
|
| 218 |
-
We
|
| 219 |
-
|
| 220 |
-
|
|
|
|
| 221 |
|
| 222 |
|
| 223 |
- Multilingual (Miracl dataset)
|
|
|
|
| 214 |
## Evaluation
|
| 215 |
|
| 216 |
|
| 217 |
+
We compare BGE-M3 with some popular methods, including BM25, openAI embedding, etc.
|
| 218 |
+
We utilized Pyserini to implement BM25, and the test results can be reproduced by this [script](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR#bm25-baseline).
|
| 219 |
+
To make the BM25 and BGE-M3 more comparable, in the experiment,
|
| 220 |
+
BM25 used the same tokenizer as BGE-M3 (i.e., the tokenizer of XLM-Roberta).
|
| 221 |
+
Using the same vocabulary can also ensure that both approaches have the same retrieval latency.
|
| 222 |
|
| 223 |
|
| 224 |
- Multilingual (Miracl dataset)
|