Multilingual-E5-base-int8-ov
This is Multilingual-E5-base model converted to the OpenVINO™ IR (Intermediate Representation) format with quantization to INT8.
Disclaimer: Model is provided as a preview and may be update in the future.
Multilingual E5 Text Embeddings: A Technical Report. Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei, arXiv 2024
This model has 12 layers and the embedding size is 768.
Usage
import torch
from transformers import AutoTokenizer
from optimum.intel.openvino import OVModelForFeatureExtraction
# Sentences we want sentence embeddings for
sentences = ["Sample Data-1", "Sample Data-2"]
# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('santhosh/multilingual-e5-base-int8-ov')
model = OVModelForFeatureExtraction.from_pretrained('OpenVINO/bge-base-en-v1.5-int8-ov')
# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
# Compute token embeddings
model_output = model(**encoded_input)
# Perform pooling. In this case, cls pooling.
sentence_embeddings = model_output[0][:, 0]
# normalize embeddings
sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)
print("Sentence embeddings:", sentence_embeddings)
Using openvino GenAI
import openvino_genai
import numpy as np
import os
import huggingface_hub as hf_hub
from typing import List
model_path = "santhosh/multilingual-e5-base-int8-ov"
sentences = ["Sample Data-1", "Sample Data-2"]
embedding_pipeline = openvino_genai.TextEmbeddingPipeline(model_path, "CPU")
embeddings = embedding_pipeline.embed_documents(sentences)
return np.array(embeddings)
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Evaluation results
- accuracy on MTEB AmazonCounterfactualClassification (en)test set self-reported78.970
- ap on MTEB AmazonCounterfactualClassification (en)test set self-reported43.694
- f1 on MTEB AmazonCounterfactualClassification (en)test set self-reported73.381
- accuracy on MTEB AmazonCounterfactualClassification (de)test set self-reported71.724
- ap on MTEB AmazonCounterfactualClassification (de)test set self-reported82.221
- f1 on MTEB AmazonCounterfactualClassification (de)test set self-reported69.955
- accuracy on MTEB AmazonCounterfactualClassification (en-ext)test set self-reported79.655
- ap on MTEB AmazonCounterfactualClassification (en-ext)test set self-reported28.508
- f1 on MTEB AmazonCounterfactualClassification (en-ext)test set self-reported66.845
- accuracy on MTEB AmazonCounterfactualClassification (ja)test set self-reported73.330