pawasthy commited on
Commit
2cceebd
·
verified ·
1 Parent(s): eabf46d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -36
README.md CHANGED
@@ -16,12 +16,14 @@ tags:
16
 
17
  **Model Summary:** Granite-embedding-english-r2 is a 149M parameter dense biencoder embedding model from the Granite Embeddings collection that can be used to generate high quality text embeddings. This model produces embedding vectors of size 768 based on context length of upto 8192 tokens. Compared to most other open-source models, this model was only trained using open-source relevance-pair datasets with permissive, enterprise-friendly license, plus IBM collected and generated datasets.
18
 
19
- The r2 models feature an increased context length of 8192 and deliver superior performance across standard and IBM-built information retrieval benchmarks (BEIR, ClapNQ), code retrieval (COIR), long-document search benchmarks (MLDR), conversational multi-turn (MTRAG), TableIR (TBD), and on many enterprise use cases.
 
 
20
 
21
  These models use a bi-encoder architecture to generate high-quality embeddings from text inputs such as queries, passages, and documents, enabling seamless comparison through cosine similarity. Built using retrieval oriented pretraining, contrastive finetuning, knowledge distillation, and model merging, granite-embedding-english-r2 is optimized to ensure strong alignment between query and passage embeddings.
22
 
23
  The latest granite embedding r2 release introduces two English embedding models, both based on the ModernBERT architecture:
24
- - _granite-embedding-english-r2_ (**149M** parameters): with an output embedding size of _768_, replacing _granite-embedding-125m-english_.
25
  - _granite-embedding-small-english-r2_ (**47M** parameters): A _first-of-its-kind_ reduced-size model, with fewer layers and a smaller output embedding size (_384_), replacing _granite-embedding-30m-english_.
26
 
27
  ## Model Details
@@ -30,11 +32,18 @@ The latest granite embedding r2 release introduces two English embedding models,
30
  - **Repository:** [ibm-granite/granite-embedding-models](https://github.com/ibm-granite/granite-embedding-models)
31
  - **Paper:** Coming Soon
32
  - **Language(s) (NLP):** English
33
- - **Release Date**: July 31, 2024
34
  - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
35
 
 
 
36
  **Intended Use:** The model is designed to produce fixed length vector representations for a given text, which can be used for text similarity, retrieval, and search applications.
37
-
 
 
 
 
 
38
 
39
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
40
 
@@ -117,31 +126,33 @@ query_embeddings = torch.nn.functional.normalize(query_embeddings, dim=1)
117
  ```
118
 
119
  ## Evaluation Results
120
- The performance of the granite embedding r2 models on MTEB Retrieval (i.e., BEIR) and code retrieval (CoIR) benchmarks is reported below.
121
- The average speed to encode documents on a single 5090 GPU using a sliding window with 512 context length is also reported.
122
 
123
- | Model | Parameters (M) | Embedding Size | BEIR Retrieval (15) | MTEB-v2 (56)| CoIR (10) | MLDR (En) | MTRAG (4) | Encoding Speed (documents/sec) |
 
 
 
 
 
124
  |------------------------------------|:--------------:|:--------------:|:-------------------:|:-----------:|:---------:|:---------:|:---------:|:-------------------------------:|
125
- | granite-embedding-30m-english | 30 | 384 | 49.1 | 59.45 | 47.0 | 32.6 | 48.61 | 140.8 |
126
- | granite-embedding-125m-english | 125 | 768 | 52.3 | 61.37 | 50.3 | 35.0 | 49.37 | 80.7 |
127
- | granite-embedding-small-english-r2 | 47 | 384 | 50.8 | 60.38 | 53.8 | 39.8 | 48.11 | 138.8 |
128
- | granite-embedding-english-r2 | 149 | 768 | 53.0 | 62.18 | 55.3 | 40.7 | 56.73 | 80.9 |
129
-
130
-
131
- | Model | Parameters (M) | Embedding Size | Average | MTEB-v2 Retrieval (10) | CoIR (10) | MLDR (En) | Table IR | MTRAG |
132
- |------------------------------------|:--------------:|:--------------:| ------- |:----------------------:|:---------:|:---------:|:--------:|:-----:|
133
- | gte-modernbert-base | | | | | | | | |
134
- | nomic-ai/modernbert-embed-base | | | | | | | | |
135
- | snowflake-arctic-embed-m-v2.0 | | | | | | | | |
136
- | gte-base-en-v1.5 | | | | | | | | |
137
- | e5-base-v2 | | | | | | | | |
138
- | e5-small-v2 | | | | | | | | |
139
- | bge-base-en-v1.5 | | | | | | | | |
140
- | bge-small-en-v1.5 | | | | | | | | |
141
- | granite-embedding-125m-english | | | | | | | | |
142
- | granite-embedding-30m-english | | | | | | | | |
143
- | granite-embedding-english-r2 | | | | | | | | |
144
- | granite-embedding-small-english-r2 | | | | | | | | |
145
 
146
  ### Model Architecture and Key Features
147
 
@@ -151,16 +162,16 @@ The latest granite embedding r2 release introduces two English embedding models,
151
 
152
  The following table shows the structure of the two models:
153
 
154
- | Model | granite-embedding-small-english-r2 | granite-embedding-english-r2 |
155
  | :--------- | :-------:|:--------:|
156
- | Embedding size | 384 | 768 |
157
- | Number of layers | 12 | 22 |
158
- | Number of attention heads | 12 | 12 |
159
- | Intermediate size | 1536 | 1152 |
160
- | Activation Function | GeGLU | GeGLU |
161
- | Vocabulary Size | 50368| 50368 |
162
- | Max. Sequence Length | 8192 | 8192 |
163
- | # Parameters | 47M | 149M |
164
 
165
 
166
  ### Training and Optimization
@@ -208,3 +219,4 @@ Granite-embedding-english-r2 leverages both permissively licensed open-source an
208
  url={https://arxiv.org/abs/2502.20204},
209
  }
210
  ```
 
 
16
 
17
  **Model Summary:** Granite-embedding-english-r2 is a 149M parameter dense biencoder embedding model from the Granite Embeddings collection that can be used to generate high quality text embeddings. This model produces embedding vectors of size 768 based on context length of upto 8192 tokens. Compared to most other open-source models, this model was only trained using open-source relevance-pair datasets with permissive, enterprise-friendly license, plus IBM collected and generated datasets.
18
 
19
+ The r2 models show strong performance across standard and IBM-built information retrieval benchmarks (BEIR, ClapNQ),
20
+ code retrieval (COIR), long-document search benchmarks (MLDR, LongEmbed), conversational multi-turn (MTRAG),
21
+ table retrieval (NQTables, OTT-QA, AIT-QA, MultiHierTT, OpenWikiTables), and on many enterprise use cases.
22
 
23
  These models use a bi-encoder architecture to generate high-quality embeddings from text inputs such as queries, passages, and documents, enabling seamless comparison through cosine similarity. Built using retrieval oriented pretraining, contrastive finetuning, knowledge distillation, and model merging, granite-embedding-english-r2 is optimized to ensure strong alignment between query and passage embeddings.
24
 
25
  The latest granite embedding r2 release introduces two English embedding models, both based on the ModernBERT architecture:
26
+ - **_granite-embedding-english-r2_** (**149M** parameters): with an output embedding size of _768_, replacing _granite-embedding-125m-english_.
27
  - _granite-embedding-small-english-r2_ (**47M** parameters): A _first-of-its-kind_ reduced-size model, with fewer layers and a smaller output embedding size (_384_), replacing _granite-embedding-30m-english_.
28
 
29
  ## Model Details
 
32
  - **Repository:** [ibm-granite/granite-embedding-models](https://github.com/ibm-granite/granite-embedding-models)
33
  - **Paper:** Coming Soon
34
  - **Language(s) (NLP):** English
35
+ - **Release Date**: Aug 15, 2025
36
  - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
37
 
38
+ ## Usage
39
+
40
  **Intended Use:** The model is designed to produce fixed length vector representations for a given text, which can be used for text similarity, retrieval, and search applications.
41
+
42
+ For efficient decoding, these models use Flash Attention 2. Installing it is optional, but can lead to faster inference.
43
+
44
+ ```shell
45
+ pip install flash_attn==2.6.1
46
+ ```
47
 
48
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
49
 
 
126
  ```
127
 
128
  ## Evaluation Results
129
+ Granite embedding r2 models show a strong performance across tasks diverse tasks.
 
130
 
131
+ Performance of the granite models on MTEB Retrieval (i.e., BEIR), MTEB-v2, code retrieval (CoIR), long-document search benchmarks (MLDR, LongEmbed), conversational multi-turn (MTRAG),
132
+ table retrieval (NQTables, OTT-QA, AIT-QA, MultiHierTT, OpenWikiTables), benchmarks is reported in the below tables.
133
+
134
+ The r2 models demonstrates speed and efficiency, while mainintaining competitive performance. The average speed to encode documents on a single H100 GPU using a sliding window with 512 context length chunks is also reported.
135
+
136
+ | Model | Parameters (M) | Embedding Size | BEIR Retrieval (15) | MTEB-v2 (41)| CoIR (10) | MLDR (En) | MTRAG (4) | Encoding Speed (docs/sec) |
137
  |------------------------------------|:--------------:|:--------------:|:-------------------:|:-----------:|:---------:|:---------:|:---------:|:-------------------------------:|
138
+ | granite-embedding-125m-english | 125 | 768 | 52.3 | 62.1 | 50.3 | 35.0 | 49.4 | 149 |
139
+ | granite-embedding-30m-english | 30 | 384 | 49.1 | 60.2 | 47.0 | 32.6 | 48.6 | 198 |
140
+ | granite-embedding-english-r2 | 149 | 768 | 53.1 | 62.8 | 55.3 | 40.7 | 56.7 | 144 |
141
+ | granite-embedding-small-english-r2 | 47 | 384 | 50.9 | 61.1 | 53.8 | 39.8 | 48.1 | 199 |
142
+
143
+
144
+ |Model | Parameters (M) | Embedding Size |**AVERAGE**|MTEB-v2 Retrieval (10) | CoIR (10) | MLDR (En) | LongEmbed (6)| Table IR (5)| MTRAG(4) | Encoding Speed (docs/sec) |
145
+ |-----------------------------------|:--------------:|:--------------:|:---------:|:---------------------:|:---------:|:---------:|:------------:|:-----------:|:--------:|-------------------------------:|
146
+ |e5-base-v2 |109|768|47.5|49.7|50.3|32.5|41.1|74.09|37.0| 115|
147
+ |bge-base-en-v1.5 |109|768|46.9|54.8|46.6|33.5|33.9|73.98|38.8| 116|
148
+ |snowflake-arctic-embed-m-v2.0 |305|768|51.4|58.4|52.2|32.4|55.4|80.75|29.2| 73|
149
+ |gte-base-en-v1.5 |137|768|52.8|55.5|42.4|42.7|59.4|80.52|36.0| 116|
150
+ |gte-modernbert-base |149|768|57.5|57.0|71.5|46.2|57.0|76.68|36.8| 88|
151
+ |nomic-ai/modernbert-embed-base |149|768|48.0|48.7|48.8|31.3|56.3|66.69|36.2|87|
152
+ |||||||||||
153
+ |granite-embedding-english-r2 |149|768|**59.5**|56.4|54.8|41.6|67.8|78.53|57.6| 144|
154
+ |granite-embedding-small-english-r2 | 47|384|55.6|53.9|53.4|40.1|61.9|75.51|48.9|199|
155
+
 
 
156
 
157
  ### Model Architecture and Key Features
158
 
 
162
 
163
  The following table shows the structure of the two models:
164
 
165
+ | Model | granite-embedding-small-english-r2 | **granite-embedding-english-r2** |
166
  | :--------- | :-------:|:--------:|
167
+ | Embedding size | 384 | **768** |
168
+ | Number of layers | 12 | **22** |
169
+ | Number of attention heads | 12 | **12** |
170
+ | Intermediate size | 1536 | **1152** |
171
+ | Activation Function | GeGLU | **GeGLU** |
172
+ | Vocabulary Size | 50368 | **50368** |
173
+ | Max. Sequence Length | 8192 | **8192** |
174
+ | # Parameters | 47M | **149M** |
175
 
176
 
177
  ### Training and Optimization
 
219
  url={https://arxiv.org/abs/2502.20204},
220
  }
221
  ```
222
+