ibm-granite
/

granite-embedding-english-r2

@@ -16,12 +16,14 @@ tags:
 **Model Summary:** Granite-embedding-english-r2 is a 149M parameter dense biencoder embedding model from the Granite Embeddings collection that can be used to generate high quality text embeddings. This model produces embedding vectors of size 768 based on context length of upto 8192 tokens. Compared to most other open-source models, this model was only trained using open-source relevance-pair datasets with permissive, enterprise-friendly license, plus IBM collected and generated datasets.
-The r2 models feature an increased context length of 8192 and deliver superior performance across standard and IBM-built information retrieval benchmarks (BEIR, ClapNQ), code retrieval (COIR), long-document search benchmarks (MLDR), conversational multi-turn (MTRAG), TableIR (TBD), and on many enterprise use cases.
 These models use a bi-encoder architecture to generate high-quality embeddings from text inputs such as queries, passages, and documents, enabling seamless comparison through cosine similarity. Built using retrieval oriented pretraining, contrastive finetuning, knowledge distillation, and model merging, granite-embedding-english-r2 is optimized to ensure strong alignment between query and passage embeddings.
 The latest granite embedding r2 release introduces two English embedding models, both based on the ModernBERT architecture:
-- _granite-embedding-english-r2_ (**149M** parameters): with an output embedding size of _768_, replacing _granite-embedding-125m-english_.
 - _granite-embedding-small-english-r2_ (**47M** parameters): A _first-of-its-kind_ reduced-size model, with fewer layers and a smaller output embedding size (_384_), replacing _granite-embedding-30m-english_.
 ## Model Details
@@ -30,11 +32,18 @@ The latest granite embedding r2 release introduces two English embedding models,
 - **Repository:** [ibm-granite/granite-embedding-models](https://github.com/ibm-granite/granite-embedding-models)
 - **Paper:** Coming Soon
 - **Language(s) (NLP):** English
-- **Release Date**: July 31, 2024
 - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
 **Intended Use:** The model is designed to produce fixed length vector representations for a given text, which can be used for text similarity, retrieval, and search applications.
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
@@ -117,31 +126,33 @@ query_embeddings = torch.nn.functional.normalize(query_embeddings, dim=1)
 ```
 ## Evaluation Results
-The performance of the granite embedding r2 models on MTEB Retrieval (i.e., BEIR) and code retrieval (CoIR) benchmarks is reported below.
-The average speed to encode documents on a single 5090 GPU using a sliding window with 512 context length is also reported.
-| Model                              | Parameters (M) | Embedding Size | BEIR Retrieval (15) | MTEB-v2 (56)| CoIR (10) | MLDR (En) | MTRAG (4) |  Encoding Speed (documents/sec) |
 |------------------------------------|:--------------:|:--------------:|:-------------------:|:-----------:|:---------:|:---------:|:---------:|:-------------------------------:|
-| granite-embedding-30m-english      |       30       |      384       |        49.1         |     59.45   |   47.0    |   32.6    |   48.61   |               140.8             |
-| granite-embedding-125m-english     |      125       |      768       |        52.3         |     61.37   |   50.3    |   35.0    |   49.37   |               80.7              |
-| granite-embedding-small-english-r2 |       47       |      384       |        50.8         |     60.38   |   53.8    |   39.8    |   48.11   |               138.8             |
-| granite-embedding-english-r2       |      149       |      768       |        53.0         |     62.18   |   55.3    |   40.7    |   56.73   |               80.9              |
-| Model                              | Parameters (M) | Embedding Size | Average | MTEB-v2 Retrieval (10) | CoIR (10) | MLDR (En) | Table IR | MTRAG |
-|------------------------------------|:--------------:|:--------------:| ------- |:----------------------:|:---------:|:---------:|:--------:|:-----:|
-| gte-modernbert-base                | | | | | | | | |
-| nomic-ai/modernbert-embed-base     | | | | | | | | |
-| snowflake-arctic-embed-m-v2.0      | | | | | | | | |
-| gte-base-en-v1.5                   | | | | | | | | |
-| e5-base-v2                         | | | | | | | | |
-| e5-small-v2                        | | | | | | | | |
-| bge-base-en-v1.5                   | | | | | | | | |
-| bge-small-en-v1.5                  | | | | | | | | |
-| granite-embedding-125m-english     | | | | | | | | |
-| granite-embedding-30m-english      | | | | | | | | |
-| granite-embedding-english-r2       | | | | | | | | |
-| granite-embedding-small-english-r2 | | | | | | | | |
 ### Model Architecture and Key Features
@@ -151,16 +162,16 @@ The latest granite embedding r2 release introduces two English embedding models,
 The following table shows the structure of the two models:
-| Model                     | granite-embedding-small-english-r2 | granite-embedding-english-r2   |
 | :---------                | :-------:|:--------:|
-| Embedding size            | 384  | 768          |
-| Number of layers          | 12    | 22           |
-| Number of attention heads | 12   | 12           |
-| Intermediate size         | 1536 | 1152         |
-| Activation Function       | GeGLU | GeGLU         |
-| Vocabulary Size           | 50368| 50368        |
-| Max. Sequence Length      | 8192  | 8192          |
-| # Parameters              | 47M  | 149M         |
 ### Training and Optimization
@@ -208,3 +219,4 @@ Granite-embedding-english-r2 leverages both permissively licensed open-source an
       url={https://arxiv.org/abs/2502.20204},
 }
 ```

 **Model Summary:** Granite-embedding-english-r2 is a 149M parameter dense biencoder embedding model from the Granite Embeddings collection that can be used to generate high quality text embeddings. This model produces embedding vectors of size 768 based on context length of upto 8192 tokens. Compared to most other open-source models, this model was only trained using open-source relevance-pair datasets with permissive, enterprise-friendly license, plus IBM collected and generated datasets.
+The r2 models show strong performance across standard and IBM-built information retrieval benchmarks (BEIR, ClapNQ),
+code retrieval (COIR), long-document search benchmarks (MLDR, LongEmbed), conversational multi-turn (MTRAG),
+table retrieval (NQTables, OTT-QA, AIT-QA, MultiHierTT, OpenWikiTables), and on many enterprise use cases.
 These models use a bi-encoder architecture to generate high-quality embeddings from text inputs such as queries, passages, and documents, enabling seamless comparison through cosine similarity. Built using retrieval oriented pretraining, contrastive finetuning, knowledge distillation, and model merging, granite-embedding-english-r2 is optimized to ensure strong alignment between query and passage embeddings.
 The latest granite embedding r2 release introduces two English embedding models, both based on the ModernBERT architecture:
+- **_granite-embedding-english-r2_** (**149M** parameters): with an output embedding size of _768_, replacing _granite-embedding-125m-english_.
 - _granite-embedding-small-english-r2_ (**47M** parameters): A _first-of-its-kind_ reduced-size model, with fewer layers and a smaller output embedding size (_384_), replacing _granite-embedding-30m-english_.
 ## Model Details
 - **Repository:** [ibm-granite/granite-embedding-models](https://github.com/ibm-granite/granite-embedding-models)
 - **Paper:** Coming Soon
 - **Language(s) (NLP):** English
+- **Release Date**: Aug 15, 2025
 - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
+## Usage
 **Intended Use:** The model is designed to produce fixed length vector representations for a given text, which can be used for text similarity, retrieval, and search applications.
+For efficient decoding, these models use Flash Attention 2. Installing it is optional, but can lead to faster inference.
+```shell
+pip install flash_attn==2.6.1
+```
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ```
 ## Evaluation Results
+Granite embedding r2 models show a strong performance across tasks diverse tasks.
+Performance of the granite models on MTEB Retrieval (i.e., BEIR), MTEB-v2, code retrieval (CoIR), long-document search benchmarks (MLDR, LongEmbed), conversational multi-turn (MTRAG),
+table retrieval (NQTables, OTT-QA, AIT-QA, MultiHierTT, OpenWikiTables),  benchmarks is reported in the below tables.
+The r2 models demonstrates speed and efficiency, while mainintaining competitive performance. The average speed to encode documents on a single H100 GPU using a sliding window with 512 context length chunks is also reported.
+| Model                              | Parameters (M) | Embedding Size | BEIR Retrieval (15) | MTEB-v2 (41)| CoIR (10) | MLDR (En) | MTRAG (4) |  Encoding Speed (docs/sec) |
 |------------------------------------|:--------------:|:--------------:|:-------------------:|:-----------:|:---------:|:---------:|:---------:|:-------------------------------:|
+| granite-embedding-125m-english     |      125       |      768       |        52.3         |     62.1   |   50.3    |   35.0    |   49.4   |               149             |
+| granite-embedding-30m-english      |       30       |      384       |        49.1         |     60.2   |   47.0    |   32.6    |   48.6   |               198             |
+| granite-embedding-english-r2       |      149       |      768       |        53.1         |     62.8   |   55.3    |   40.7    |   56.7   |               144              |
+| granite-embedding-small-english-r2 |       47       |      384       |        50.9         |     61.1   |   53.8    |   39.8    |   48.1   |               199             |
+|Model                              | Parameters (M) | Embedding Size |**AVERAGE**|MTEB-v2 Retrieval (10) | CoIR (10) | MLDR (En) | LongEmbed (6)| Table IR (5)| MTRAG(4) |  Encoding Speed (docs/sec) |
+|-----------------------------------|:--------------:|:--------------:|:---------:|:---------------------:|:---------:|:---------:|:------------:|:-----------:|:--------:|-------------------------------:|
+|e5-base-v2                         |109|768|47.5|49.7|50.3|32.5|41.1|74.09|37.0| 115|
+|bge-base-en-v1.5                   |109|768|46.9|54.8|46.6|33.5|33.9|73.98|38.8| 116|
+|snowflake-arctic-embed-m-v2.0      |305|768|51.4|58.4|52.2|32.4|55.4|80.75|29.2| 73|
+|gte-base-en-v1.5                   |137|768|52.8|55.5|42.4|42.7|59.4|80.52|36.0| 116|
+|gte-modernbert-base                |149|768|57.5|57.0|71.5|46.2|57.0|76.68|36.8| 88|
+|nomic-ai/modernbert-embed-base     |149|768|48.0|48.7|48.8|31.3|56.3|66.69|36.2|87|
+|||||||||||
+|granite-embedding-english-r2       |149|768|**59.5**|56.4|54.8|41.6|67.8|78.53|57.6| 144|
+|granite-embedding-small-english-r2 | 47|384|55.6|53.9|53.4|40.1|61.9|75.51|48.9|199|
 ### Model Architecture and Key Features
 The following table shows the structure of the two models:
+| Model                     | granite-embedding-small-english-r2 | **granite-embedding-english-r2**   |
 | :---------                | :-------:|:--------:|
+| Embedding size            | 384      | **768**      |
+| Number of layers          | 12       | **22**       |
+| Number of attention heads | 12       | **12**       |
+| Intermediate size         | 1536     | **1152**     |
+| Activation Function       | GeGLU    | **GeGLU**    |
+| Vocabulary Size           | 50368    | **50368**    |
+| Max. Sequence Length      | 8192     | **8192**     |
+| # Parameters              | 47M      | **149M**     |
 ### Training and Optimization
       url={https://arxiv.org/abs/2502.20204},
 }
 ```