ranarag commited on
Commit
5e2781f
·
verified ·
1 Parent(s): 1be352d

Update README.md

Browse files

added results and modified model summary.

Files changed (1) hide show
  1. README.md +167 -71
README.md CHANGED
@@ -13,12 +13,13 @@ base_model:
13
  # Granite-3.3-2B-Instruct
14
 
15
  **Model Summary:**
16
- Granite-3.3-2B-Instruct is an 2-billion parameter long-context AI model fine-tuned for superior reasoning and instruction-following capabilities. Built on top of Granite-3.3-2B-Base, the model delivers significant gains on benchmarks for measuring generic performance including Alpaca and Arena-Hard, and significant improvements in mathematics, coding, and instruction following. It incorporates Fill-in-the-Middle (FIM) capabilities and supports structured reasoning through toggleable custom tags (\<think\>\<\/think\> and \<response\>\<\/response\>), providing clear separation between internal reasoning processes and final outputs. The model has been trained on a carefully balanced combination of permissively licensed data and curated synthetic tasks.
17
 
18
 
19
  - **Developers:** Granite Team, IBM
 
20
  - **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
21
- - **Release Date**: April 10th, 2025
22
  - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
23
 
24
  **Supported Languages:**
@@ -37,6 +38,7 @@ This model is designed to handle general instruction-following tasks and can be
37
  * Code related tasks
38
  * Function-calling tasks
39
  * Multilingual dialog use cases
 
40
  * Long-context tasks including long document/meeting summarization, long document QA, etc.
41
 
42
 
@@ -82,6 +84,7 @@ prediction = tokenizer.decode(output[0, input_ids["input_ids"].shape[1]:], skip_
82
  print(prediction)
83
  ```
84
 
 
85
  **Example Outputs**
86
  - thinking=True
87
  ```md
@@ -205,10 +208,11 @@ By implementing this innovative prevention strategy, we can significantly reduce
205
  <table>
206
 
207
  <thead>
 
208
  <tr>
209
  <th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
210
  <th style="text-align:center; background-color: #001d6c; color: white;">ArenaHard</th>
211
- <th style="text-align:center; background-color: #001d6c; color: white;">Alpaca-Eval-2</th>
212
  <th style="text-align:center; background-color: #001d6c; color: white;">MMLU</th>
213
  <th style="text-align:center; background-color: #001d6c; color: white;">PopQA</th>
214
  <th style="text-align:center; background-color: #001d6c; color: white;">TruthfulQA</th>
@@ -218,8 +222,55 @@ By implementing this innovative prevention strategy, we can significantly reduce
218
  <th style="text-align:center; background-color: #001d6c; color: white;">HumanEval</th>
219
  <th style="text-align:center; background-color: #001d6c; color: white;">HumanEval+</th>
220
  <th style="text-align:center; background-color: #001d6c; color: white;">IFEval</th>
 
221
  </tr></thead>
222
  <tbody>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
223
  <tr>
224
  <td style="text-align:left; background-color: #DAE8FF; color: black;">Llama-3.1-8B-Instruct</td>
225
  <td style="text-align:center; background-color: #DAE8FF; color: black;">36.43</td>
@@ -233,6 +284,7 @@ By implementing this innovative prevention strategy, we can significantly reduce
233
  <td style="text-align:center; background-color: #DAE8FF; color: black;">85.32</td>
234
  <td style="text-align:center; background-color: #DAE8FF; color: black;">80.15</td>
235
  <td style="text-align:center; background-color: #DAE8FF; color: black;">79.10</td>
 
236
 
237
  </tr>
238
 
@@ -249,7 +301,7 @@ By implementing this innovative prevention strategy, we can significantly reduce
249
  <td style="text-align:center; background-color: #DAE8FF; color: black;">67.54</td>
250
  <td style="text-align:center; background-color: #DAE8FF; color: black;">62.91</td>
251
  <td style="text-align:center; background-color: #DAE8FF; color: black;">66.50</td>
252
-
253
  </tr>
254
 
255
  <tr>
@@ -265,7 +317,7 @@ By implementing this innovative prevention strategy, we can significantly reduce
265
  <td style="text-align:center; background-color: #DAE8FF; color: black;">93.35</td>
266
  <td style="text-align:center; background-color: #DAE8FF; color: black;">89.91</td>
267
  <td style="text-align:center; background-color: #DAE8FF; color: black;">74.90</td>
268
-
269
  </tr>
270
 
271
  <tr>
@@ -281,7 +333,7 @@ By implementing this innovative prevention strategy, we can significantly reduce
281
  <td style="text-align:center; background-color: #DAE8FF; color: black;">79.89</td>
282
  <td style="text-align:center; background-color: #DAE8FF; color: black;">78.43</td>
283
  <td style="text-align:center; background-color: #DAE8FF; color: black;">59.10</td>
284
-
285
  </tr>
286
 
287
  <tr>
@@ -297,25 +349,9 @@ By implementing this innovative prevention strategy, we can significantly reduce
297
  <td style="text-align:center; background-color: #DAE8FF; color: black;">89.63</td>
298
  <td style="text-align:center; background-color: #DAE8FF; color: black;">85.79</td>
299
  <td style="text-align:center; background-color: #DAE8FF; color: black;">73.20</td>
300
-
301
- </tr>
302
-
303
-
304
- <tr>
305
- <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-2B-Instruct</td>
306
- <td style="text-align:center; background-color: #DAE8FF; color: black;">23.3</td>
307
- <td style="text-align:center; background-color: #DAE8FF; color: black;">27.17</td>
308
- <td style="text-align:center; background-color: #DAE8FF; color: black;">57.11</td>
309
- <td style="text-align:center; background-color: #DAE8FF; color: black;">20.55</td>
310
- <td style="text-align:center; background-color: #DAE8FF; color: black;">59.79</td>
311
- <td style="text-align:center; background-color: #DAE8FF; color: black;">54.46</td>
312
- <td style="text-align:center; background-color: #DAE8FF; color: black;">18.68</td>
313
- <td style="text-align:center; background-color: #DAE8FF; color: black;">67.55</td>
314
- <td style="text-align:center; background-color: #DAE8FF; color: black;">79.45</td>
315
- <td style="text-align:center; background-color: #DAE8FF; color: black;">75.26</td>
316
- <td style="text-align:center; background-color: #DAE8FF; color: black;">63.59</td>
317
-
318
  </tr>
 
319
 
320
  <tr>
321
  <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-8B-Instruct</td>
@@ -330,24 +366,8 @@ By implementing this innovative prevention strategy, we can significantly reduce
330
  <td style="text-align:center; background-color: #DAE8FF; color: black;">89.35</td>
331
  <td style="text-align:center; background-color: #DAE8FF; color: black;">85.72</td>
332
  <td style="text-align:center; background-color: #DAE8FF; color: black;">74.31</td>
333
-
334
  </tr>
335
-
336
- <tr>
337
- <td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.2-2B-Instruct</b></td>
338
- <td style="text-align:center; background-color: #DAE8FF; color: black;">24.86</td>
339
- <td style="text-align:center; background-color: #DAE8FF; color: black;">34.51</td>
340
- <td style="text-align:center; background-color: #DAE8FF; color: black;">57.18</td>
341
- <td style="text-align:center; background-color: #DAE8FF; color: black;">20.56</td>
342
- <td style="text-align:center; background-color: #DAE8FF; color: black;">59.8</td>
343
- <td style="text-align:center; background-color: #DAE8FF; color: black;">52.27</td>
344
- <td style="text-align:center; background-color: #DAE8FF; color: black;">21.12</td>
345
- <td style="text-align:center; background-color: #DAE8FF; color: black;">67.02</td>
346
- <td style="text-align:center; background-color: #DAE8FF; color: black;">80.13</td>
347
- <td style="text-align:center; background-color: #DAE8FF; color: black;">73.39</td>
348
- <td style="text-align:center; background-color: #DAE8FF; color: black;">61.55</td>
349
- </tr>
350
-
351
  <tr>
352
  <td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.3-8B-Instruct</b></td>
353
  <td style="text-align:center; background-color: #DAE8FF; color: black;"> 57.56 </td>
@@ -361,22 +381,8 @@ By implementing this innovative prevention strategy, we can significantly reduce
361
  <td style="text-align:center; background-color: #DAE8FF; color: black;"> 89.73 </td>
362
  <td style="text-align:center; background-color: #DAE8FF; color: black;"> 86.09 </td>
363
  <td style="text-align:center; background-color: #DAE8FF; color: black;"> 74.82 </td>
364
- </tr>
365
-
366
- <tr>
367
- <td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.3-2B-Instruct</b></td>
368
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 28.86 </td>
369
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 43.45 </td>
370
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 55.88 </td>
371
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 18.4 </td>
372
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 58.97 </td>
373
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 5.41 </td>
374
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 35.98 </td>
375
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 72.48 </td>
376
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 80.51 </td>
377
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 75.68 </td>
378
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 65.8 </td>
379
- </tr>
380
  </tbody></table>
381
 
382
  <table>
@@ -388,37 +394,127 @@ By implementing this innovative prevention strategy, we can significantly reduce
388
  <th style="text-align:center; background-color: #001d6c; color: white;">MATH500</th>
389
  </tr></thead>
390
  <tbody>
391
- <tr>
392
- <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-8B-Instruct</td>
393
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 1.97 </td>
394
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 48.73 </td>
395
- </tr>
396
  <tr>
397
  <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-2B-Instruct</td>
398
  <td style="text-align:center; background-color: #DAE8FF; color: black;"> 0.89 </td>
399
  <td style="text-align:center; background-color: #DAE8FF; color: black;"> 35.07 </td>
400
  </tr>
401
- <tr>
402
- <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-8B-Instruct</td>
403
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 2.43 </td>
404
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 52.8 </td>
405
- </tr>
406
  <tr>
407
  <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-2B-Instruct</td>
408
  <td style="text-align:center; background-color: #DAE8FF; color: black;"> 0.89 </td>
409
  <td style="text-align:center; background-color: #DAE8FF; color: black;"> 35.54 </td>
410
  </tr>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
411
  <tr>
412
  <td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.3-8B-Instruct</b></td>
413
  <td style="text-align:center; background-color: #DAE8FF; color: black;"> 8.12 </td>
414
  <td style="text-align:center; background-color: #DAE8FF; color: black;"> 69.02 </td>
415
  </tr>
 
 
 
 
 
 
 
416
  <tr>
417
- <td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.3-2B-Instruct</b></td>
418
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 3.28 </td>
419
- <td style="text-align:center; background-color: #DAE8FF; color: black;"> 58.09 </td>
420
  </tr>
421
- </tbody></table>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
422
 
423
  **Training Data:**
424
  Overall, our training data is largely comprised of two key sources: (1) publicly available datasets with permissive license, (2) internal synthetically generated data targeted to enhance reasoning capabilites.
 
13
  # Granite-3.3-2B-Instruct
14
 
15
  **Model Summary:**
16
+ Granite-3.3-2B-Instruct is a 2-billion parameter 128K context length language model fine-tuned for improved reasoning and instruction-following capabilities. Built on top of Granite-3.3-2B-Base, the model delivers significant gains on benchmarks for measuring generic performance including AlpacaEval-2.0 and Arena-Hard, and improvements in mathematics, coding, and instruction following. It also supports Fill-in-the-Middle (FIM) for code completion tasks and structured reasoning through \<think\>\<\/think\> and \<response\>\<\/response\> tags, providing clear separation between internal thoughts and final outputs. The model has been trained on a carefully balanced combination of permissively licensed data and curated synthetic tasks.
17
 
18
 
19
  - **Developers:** Granite Team, IBM
20
+ - **GitHub Repository:** [ibm-granite/granite-3.3-language-models](https://github.com/ibm-granite/granite-3.3-language-models)
21
  - **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
22
+ - **Release Date**: April 16th, 2025
23
  - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
24
 
25
  **Supported Languages:**
 
38
  * Code related tasks
39
  * Function-calling tasks
40
  * Multilingual dialog use cases
41
+ * **Fill-in-the-middle**
42
  * Long-context tasks including long document/meeting summarization, long document QA, etc.
43
 
44
 
 
84
  print(prediction)
85
  ```
86
 
87
+ **Example Outputs**
88
  **Example Outputs**
89
  - thinking=True
90
  ```md
 
208
  <table>
209
 
210
  <thead>
211
+ <caption style="text-align:center"><b>Comparison with different models over various benchmarks. Scores of AlpacaEval-2.0 and Arena-Hard are calculated with thinking=True</b></caption>
212
  <tr>
213
  <th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
214
  <th style="text-align:center; background-color: #001d6c; color: white;">ArenaHard</th>
215
+ <th style="text-align:center; background-color: #001d6c; color: white;">AlpacaEval-2.0</th>
216
  <th style="text-align:center; background-color: #001d6c; color: white;">MMLU</th>
217
  <th style="text-align:center; background-color: #001d6c; color: white;">PopQA</th>
218
  <th style="text-align:center; background-color: #001d6c; color: white;">TruthfulQA</th>
 
222
  <th style="text-align:center; background-color: #001d6c; color: white;">HumanEval</th>
223
  <th style="text-align:center; background-color: #001d6c; color: white;">HumanEval+</th>
224
  <th style="text-align:center; background-color: #001d6c; color: white;">IFEval</th>
225
+ <th style="text-align:center; background-color: #001d6c; color: white;">Attaq</th>
226
  </tr></thead>
227
  <tbody>
228
+ <tr>
229
+ <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-2B-Instruct</td>
230
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">23.3</td>
231
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">27.17</td>
232
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">57.11</td>
233
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">20.55</td>
234
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">59.79</td>
235
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">54.46</td>
236
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">18.68</td>
237
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">67.55</td>
238
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">79.45</td>
239
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">75.26</td>
240
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">63.59</td>
241
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">84.7</td>
242
+ </tr>
243
+ <tr>
244
+ <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-2B-Instruct</td>
245
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">24.86</td>
246
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">34.51</td>
247
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">57.18</td>
248
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">20.56</td>
249
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">59.8</td>
250
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">52.27</td>
251
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">21.12</td>
252
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">67.02</td>
253
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">80.13</td>
254
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">73.39</td>
255
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">61.55</td>
256
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">83.23</td>
257
+ </tr>
258
+ <tr>
259
+ <td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.3-2B-Instruct</b></td>
260
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 28.86 </td>
261
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 43.45 </td>
262
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 55.88 </td>
263
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 18.4 </td>
264
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 58.97 </td>
265
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 5.41 </td>
266
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 35.98 </td>
267
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 72.48 </td>
268
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 80.51 </td>
269
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 75.68 </td>
270
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 65.8 </td>
271
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">87.47</td>
272
+ </tr>
273
+
274
  <tr>
275
  <td style="text-align:left; background-color: #DAE8FF; color: black;">Llama-3.1-8B-Instruct</td>
276
  <td style="text-align:center; background-color: #DAE8FF; color: black;">36.43</td>
 
284
  <td style="text-align:center; background-color: #DAE8FF; color: black;">85.32</td>
285
  <td style="text-align:center; background-color: #DAE8FF; color: black;">80.15</td>
286
  <td style="text-align:center; background-color: #DAE8FF; color: black;">79.10</td>
287
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">83.43</td>
288
 
289
  </tr>
290
 
 
301
  <td style="text-align:center; background-color: #DAE8FF; color: black;">67.54</td>
302
  <td style="text-align:center; background-color: #DAE8FF; color: black;">62.91</td>
303
  <td style="text-align:center; background-color: #DAE8FF; color: black;">66.50</td>
304
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">42.87</td>
305
  </tr>
306
 
307
  <tr>
 
317
  <td style="text-align:center; background-color: #DAE8FF; color: black;">93.35</td>
318
  <td style="text-align:center; background-color: #DAE8FF; color: black;">89.91</td>
319
  <td style="text-align:center; background-color: #DAE8FF; color: black;">74.90</td>
320
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">81.90</td>
321
  </tr>
322
 
323
  <tr>
 
333
  <td style="text-align:center; background-color: #DAE8FF; color: black;">79.89</td>
334
  <td style="text-align:center; background-color: #DAE8FF; color: black;">78.43</td>
335
  <td style="text-align:center; background-color: #DAE8FF; color: black;">59.10</td>
336
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">42.45</td>
337
  </tr>
338
 
339
  <tr>
 
349
  <td style="text-align:center; background-color: #DAE8FF; color: black;">89.63</td>
350
  <td style="text-align:center; background-color: #DAE8FF; color: black;">85.79</td>
351
  <td style="text-align:center; background-color: #DAE8FF; color: black;">73.20</td>
352
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">85.73</td>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
353
  </tr>
354
+
355
 
356
  <tr>
357
  <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-8B-Instruct</td>
 
366
  <td style="text-align:center; background-color: #DAE8FF; color: black;">89.35</td>
367
  <td style="text-align:center; background-color: #DAE8FF; color: black;">85.72</td>
368
  <td style="text-align:center; background-color: #DAE8FF; color: black;">74.31</td>
369
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">84.7</td>
370
  </tr>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
371
  <tr>
372
  <td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.3-8B-Instruct</b></td>
373
  <td style="text-align:center; background-color: #DAE8FF; color: black;"> 57.56 </td>
 
381
  <td style="text-align:center; background-color: #DAE8FF; color: black;"> 89.73 </td>
382
  <td style="text-align:center; background-color: #DAE8FF; color: black;"> 86.09 </td>
383
  <td style="text-align:center; background-color: #DAE8FF; color: black;"> 74.82 </td>
384
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">88.5</td>
385
+ </tr>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
386
  </tbody></table>
387
 
388
  <table>
 
394
  <th style="text-align:center; background-color: #001d6c; color: white;">MATH500</th>
395
  </tr></thead>
396
  <tbody>
 
 
 
 
 
397
  <tr>
398
  <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-2B-Instruct</td>
399
  <td style="text-align:center; background-color: #DAE8FF; color: black;"> 0.89 </td>
400
  <td style="text-align:center; background-color: #DAE8FF; color: black;"> 35.07 </td>
401
  </tr>
 
 
 
 
 
402
  <tr>
403
  <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-2B-Instruct</td>
404
  <td style="text-align:center; background-color: #DAE8FF; color: black;"> 0.89 </td>
405
  <td style="text-align:center; background-color: #DAE8FF; color: black;"> 35.54 </td>
406
  </tr>
407
+ <tr>
408
+ <td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.3-2B-Instruct</b></td>
409
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 3.28 </td>
410
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 58.09 </td>
411
+ </tr>
412
+ <tr>
413
+ <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-8B-Instruct</td>
414
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 1.97 </td>
415
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 48.73 </td>
416
+ </tr>
417
+ <tr>
418
+ <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-8B-Instruct</td>
419
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 2.43 </td>
420
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 52.8 </td>
421
+ </tr>
422
  <tr>
423
  <td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.3-8B-Instruct</b></td>
424
  <td style="text-align:center; background-color: #DAE8FF; color: black;"> 8.12 </td>
425
  <td style="text-align:center; background-color: #DAE8FF; color: black;"> 69.02 </td>
426
  </tr>
427
+ </tbody></table>
428
+
429
+ </tbody></table>
430
+
431
+ <table>
432
+ <caption><b>Thinking Ablation</b></caption>
433
+ <thead>
434
  <tr>
435
+ <th rowspan="4" style="text-align:left; background-color: #001d6c; color: white;">Models</th>
436
+ <th colspan="4" style="text-align:center; background-color: #001d6c; color: white;">Thinking=False</th>
437
+ <th colspan="4" style="text-align:center; background-color: #001d6c; color: white;">Thinking=True</th>
438
  </tr>
439
+ <tr>
440
+ <th style="text-align:center; background-color: #001d6c; color: white;">ArenaHard</th>
441
+ <th style="text-align:center; background-color: #001d6c; color: white;">Alpaca-Eval-2</th>
442
+ <th style="text-align:center; background-color: #001d6c; color: white;">AIME24</th>
443
+ <th style="text-align:center; background-color: #001d6c; color: white;">MATH500</th>
444
+ <th style="text-align:center; background-color: #001d6c; color: white;">ArenaHard</th>
445
+ <th style="text-align:center; background-color: #001d6c; color: white;">Alpaca-Eval-2</th>
446
+ <th style="text-align:center; background-color: #001d6c; color: white;">AIME24</th>
447
+ <th style="text-align:center; background-color: #001d6c; color: white;">MATH500</th>
448
+ </tr></thead>
449
+ <tbody>
450
+ <tr>
451
+ <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-2B-Instruct</td>
452
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">23.3</td>
453
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">27.17</td>
454
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">0.89</td>
455
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">35.07</td>
456
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
457
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
458
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
459
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
460
+ </tr>
461
+ <tr>
462
+ <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-2B-Instruct</td>
463
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">30.42</td>
464
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">31.65</td>
465
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">0.94</td>
466
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">37.15</td>
467
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">26.6</td>
468
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">34.51</td>
469
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">0.89</td>
470
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">35.54</td>
471
+ </tr>
472
+ <tr>
473
+ <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.3-2B-Instruct</td>
474
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> - </td>
475
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> - </td>
476
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
477
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
478
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 28.86 </td>
479
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 43.45 </td>
480
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">3.28</td>
481
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">58.09</td>
482
+ </tr>
483
+ <tr>
484
+ <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-8B-Instruct</td>
485
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">37.58</td>
486
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">30.34</td>
487
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">1.97</td>
488
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">48.73</td>
489
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
490
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
491
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
492
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
493
+ </tr>
494
+ <tr>
495
+ <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-8B-Instruct</td>
496
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">40.54</td>
497
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">36.89</td>
498
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">3.13</td>
499
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">50.78</td>
500
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">55.25</td>
501
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">61.19</td>
502
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">2.43</td>
503
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">52.8</td>
504
+ </tr>
505
+ <tr>
506
+ <td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.3-8B-Instruct</td>
507
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> - </td>
508
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> - </td>
509
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
510
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
511
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 57.56 </td>
512
+ <td style="text-align:center; background-color: #DAE8FF; color: black;"> 62.68 </td>
513
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">8.12</td>
514
+ <td style="text-align:center; background-color: #DAE8FF; color: black;">69.02</td>
515
+ </tr>
516
+ </table>
517
+ <tbody>
518
 
519
  **Training Data:**
520
  Overall, our training data is largely comprised of two key sources: (1) publicly available datasets with permissive license, (2) internal synthetically generated data targeted to enhance reasoning capabilites.