--- language: - en license: apache-2.0 tags: - biencoder - sentence-transformers - text-classification - sentence-pair-classification - semantic-similarity - semantic-search - retrieval - reranking - generated_from_trainer - dataset_size:483820 - loss:MultipleNegativesSymmetricRankingLoss base_model: Alibaba-NLP/gte-modernbert-base widget: - source_sentence: 'See Precambrian time scale # Proposed Geologic timeline for another set of periods 4600 -- 541 MYA .' sentences: - In 2014 election , Biju Janata Dal candidate Tathagat Satapathy Bharatiya Janata party candidate Rudra Narayan Pany defeated with a margin of 1.37,340 votes . - In Scotland , the Strathclyde Partnership for Transport , formerly known as Strathclyde Passenger Transport Executive , comprises the former Strathclyde region , which includes the urban area around Glasgow . - 'See Precambrian Time Scale # Proposed Geological Timeline for another set of periods of 4600 -- 541 MYA .' - source_sentence: It is also 5 kilometers northeast of Tamaqua , 27 miles south of Allentown and 9 miles northwest of Hazleton . sentences: - In 1948 he moved to Massachusetts , and eventually settled in Vermont . - Suddenly I remembered that I was a New Zealander , I caught the first plane home and came back . - It is also 5 miles northeast of Tamaqua , 27 miles south of Allentown , and 9 miles northwest of Hazleton . - source_sentence: The party has a Member of Parliament , a member of the House of Lords , three members of the London Assembly and two Members of the European Parliament . sentences: - The party has one Member of Parliament , one member of the House of Lords , three Members of the London Assembly and two Members of the European Parliament . - Grapsid crabs dominate in Australia , Malaysia and Panama , while gastropods Cerithidea scalariformis and Melampus coeffeus are important seed predators in Florida mangroves . - Music Story is a music service website and international music data provider that curates , aggregates and analyses metadata for digital music services . - source_sentence: 'The play received two 1969 Tony Award nominations : Best Actress in a Play ( Michael Annals ) and Best Costume Design ( Charlotte Rae ) .' sentences: - Ravishanker is a fellow of the International Statistical Institute and an elected member of the American Statistical Association . - 'In 1969 , the play received two Tony - Award nominations : Best Actress in a Theatre Play ( Michael Annals ) and Best Costume Design ( Charlotte Rae ) .' - AMD and Nvidia both have proprietary methods of scaling , CrossFireX for AMD , and SLI for Nvidia . - source_sentence: He was a close friend of Ángel Cabrera and is a cousin of golfer Tony Croatto . sentences: - He was a close friend of Ángel Cabrera , and is a cousin of golfer Tony Croatto . - Eugenijus Bartulis ( born December 7 , 1949 in Kaunas ) is a Lithuanian Roman Catholic priest , and Bishop of Šiauliai . - UWIRE also distributes its members content to professional media outlets , including Yahoo , CNN and CBS News . datasets: - redis/langcache-sentencepairs-v1 pipeline_tag: sentence-similarity library_name: sentence-transformers metrics: - cosine_accuracy@1 - cosine_precision@1 - cosine_recall@1 - cosine_ndcg@10 - cosine_mrr@1 - cosine_map@100 model-index: - name: Redis fine-tuned BiEncoder model for semantic caching on LangCache results: - task: type: information-retrieval name: Information Retrieval dataset: name: train type: train metrics: - type: cosine_accuracy@1 value: 0.5579129681749296 name: Cosine Accuracy@1 - type: cosine_precision@1 value: 0.5579129681749296 name: Cosine Precision@1 - type: cosine_recall@1 value: 0.5359784831006956 name: Cosine Recall@1 - type: cosine_ndcg@10 value: 0.7522148521266401 name: Cosine Ndcg@10 - type: cosine_mrr@1 value: 0.5579129681749296 name: Cosine Mrr@1 - type: cosine_map@100 value: 0.6974638651409195 name: Cosine Map@100 --- # Redis fine-tuned BiEncoder model for semantic caching on LangCache This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Alibaba-NLP/gte-modernbert-base](https://huggingface.co/Alibaba-NLP/gte-modernbert-base) on the [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for sentence pair similarity. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [Alibaba-NLP/gte-modernbert-base](https://huggingface.co/Alibaba-NLP/gte-modernbert-base) - **Maximum Sequence Length:** 100 tokens - **Output Dimensionality:** 768 dimensions - **Similarity Function:** Cosine Similarity - **Training Dataset:** - [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1) - **Language:** en - **License:** apache-2.0 ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 100, 'do_lower_case': False, 'architecture': 'ModernBertModel'}) (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("redis/langcache-embed-v3") # Run inference sentences = [ 'He was a close friend of Ángel Cabrera and is a cousin of golfer Tony Croatto .', 'He was a close friend of Ángel Cabrera , and is a cousin of golfer Tony Croatto .', 'UWIRE also distributes its members content to professional media outlets , including Yahoo , CNN and CBS News .', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities) # tensor([[0.9922, 0.9922, 0.5352], # [0.9922, 0.9961, 0.5391], # [0.5352, 0.5391, 1.0000]], dtype=torch.bfloat16) ``` ## Evaluation ### Metrics #### Information Retrieval * Dataset: `train` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:-------------------|:-----------| | cosine_accuracy@1 | 0.5579 | | cosine_precision@1 | 0.5579 | | cosine_recall@1 | 0.536 | | **cosine_ndcg@10** | **0.7522** | | cosine_mrr@1 | 0.5579 | | cosine_map@100 | 0.6975 | ## Training Details ### Training Dataset #### LangCache Sentence Pairs (all) * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1) * Size: 26,850 training samples * Columns: sentence1, sentence2, and label * Approximate statistics based on the first 1000 samples: | | sentence1 | sentence2 | label | |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-----------------------------| | type | string | string | int | | details | | | | * Samples: | sentence1 | sentence2 | label | |:----------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------|:---------------| | The newer Punts are still very much in existence today and race in the same fleets as the older boats . | The newer punts are still very much in existence today and run in the same fleets as the older boats . | 1 | | After losing his second election , he resigned as opposition leader and was replaced by Geoff Pearsall . | Max Bingham resigned as opposition leader after losing his second election , and was replaced by Geoff Pearsall . | 1 | | The 12F was officially homologated on August 21 , 1929 and exhibited at the Paris Salon in 1930 . | The 12F was officially homologated on 21 August 1929 and displayed at the 1930 Paris Salon . | 1 | * Loss: [MultipleNegativesSymmetricRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters: ```json { "scale": 20.0, "similarity_fct": "cos_sim", "gather_across_devices": false } ``` ### Evaluation Dataset #### LangCache Sentence Pairs (all) * Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1) * Size: 26,850 evaluation samples * Columns: sentence1, sentence2, and label * Approximate statistics based on the first 1000 samples: | | sentence1 | sentence2 | label | |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-----------------------------| | type | string | string | int | | details | | | | * Samples: | sentence1 | sentence2 | label | |:----------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------|:---------------| | The newer Punts are still very much in existence today and race in the same fleets as the older boats . | The newer punts are still very much in existence today and run in the same fleets as the older boats . | 1 | | After losing his second election , he resigned as opposition leader and was replaced by Geoff Pearsall . | Max Bingham resigned as opposition leader after losing his second election , and was replaced by Geoff Pearsall . | 1 | | The 12F was officially homologated on August 21 , 1929 and exhibited at the Paris Salon in 1930 . | The 12F was officially homologated on 21 August 1929 and displayed at the 1930 Paris Salon . | 1 | * Loss: [MultipleNegativesSymmetricRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters: ```json { "scale": 20.0, "similarity_fct": "cos_sim", "gather_across_devices": false } ``` ### Training Logs | Epoch | Step | train_cosine_ndcg@10 | |:-----:|:----:|:--------------------:| | -1 | -1 | 0.7522 | ### Framework Versions - Python: 3.12.3 - Sentence Transformers: 5.1.0 - Transformers: 4.56.0 - PyTorch: 2.8.0+cu128 - Accelerate: 1.10.1 - Datasets: 4.0.0 - Tokenizers: 0.22.0 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ```