Update README.md
Browse filesadded results and modified model summary.
README.md
CHANGED
@@ -13,12 +13,13 @@ base_model:
|
|
13 |
# Granite-3.3-2B-Instruct
|
14 |
|
15 |
**Model Summary:**
|
16 |
-
Granite-3.3-2B-Instruct is
|
17 |
|
18 |
|
19 |
- **Developers:** Granite Team, IBM
|
|
|
20 |
- **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
|
21 |
-
- **Release Date**: April
|
22 |
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
23 |
|
24 |
**Supported Languages:**
|
@@ -37,6 +38,7 @@ This model is designed to handle general instruction-following tasks and can be
|
|
37 |
* Code related tasks
|
38 |
* Function-calling tasks
|
39 |
* Multilingual dialog use cases
|
|
|
40 |
* Long-context tasks including long document/meeting summarization, long document QA, etc.
|
41 |
|
42 |
|
@@ -82,6 +84,7 @@ prediction = tokenizer.decode(output[0, input_ids["input_ids"].shape[1]:], skip_
|
|
82 |
print(prediction)
|
83 |
```
|
84 |
|
|
|
85 |
**Example Outputs**
|
86 |
- thinking=True
|
87 |
```md
|
@@ -205,10 +208,11 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
205 |
<table>
|
206 |
|
207 |
<thead>
|
|
|
208 |
<tr>
|
209 |
<th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
|
210 |
<th style="text-align:center; background-color: #001d6c; color: white;">ArenaHard</th>
|
211 |
-
<th style="text-align:center; background-color: #001d6c; color: white;">
|
212 |
<th style="text-align:center; background-color: #001d6c; color: white;">MMLU</th>
|
213 |
<th style="text-align:center; background-color: #001d6c; color: white;">PopQA</th>
|
214 |
<th style="text-align:center; background-color: #001d6c; color: white;">TruthfulQA</th>
|
@@ -218,8 +222,55 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
218 |
<th style="text-align:center; background-color: #001d6c; color: white;">HumanEval</th>
|
219 |
<th style="text-align:center; background-color: #001d6c; color: white;">HumanEval+</th>
|
220 |
<th style="text-align:center; background-color: #001d6c; color: white;">IFEval</th>
|
|
|
221 |
</tr></thead>
|
222 |
<tbody>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
223 |
<tr>
|
224 |
<td style="text-align:left; background-color: #DAE8FF; color: black;">Llama-3.1-8B-Instruct</td>
|
225 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">36.43</td>
|
@@ -233,6 +284,7 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
233 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">85.32</td>
|
234 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">80.15</td>
|
235 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">79.10</td>
|
|
|
236 |
|
237 |
</tr>
|
238 |
|
@@ -249,7 +301,7 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
249 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">67.54</td>
|
250 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">62.91</td>
|
251 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">66.50</td>
|
252 |
-
|
253 |
</tr>
|
254 |
|
255 |
<tr>
|
@@ -265,7 +317,7 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
265 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">93.35</td>
|
266 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">89.91</td>
|
267 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">74.90</td>
|
268 |
-
|
269 |
</tr>
|
270 |
|
271 |
<tr>
|
@@ -281,7 +333,7 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
281 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">79.89</td>
|
282 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">78.43</td>
|
283 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">59.10</td>
|
284 |
-
|
285 |
</tr>
|
286 |
|
287 |
<tr>
|
@@ -297,25 +349,9 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
297 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">89.63</td>
|
298 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">85.79</td>
|
299 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">73.20</td>
|
300 |
-
|
301 |
-
</tr>
|
302 |
-
|
303 |
-
|
304 |
-
<tr>
|
305 |
-
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-2B-Instruct</td>
|
306 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">23.3</td>
|
307 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">27.17</td>
|
308 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">57.11</td>
|
309 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">20.55</td>
|
310 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">59.79</td>
|
311 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">54.46</td>
|
312 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">18.68</td>
|
313 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">67.55</td>
|
314 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">79.45</td>
|
315 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">75.26</td>
|
316 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">63.59</td>
|
317 |
-
|
318 |
</tr>
|
|
|
319 |
|
320 |
<tr>
|
321 |
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-8B-Instruct</td>
|
@@ -330,24 +366,8 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
330 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">89.35</td>
|
331 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">85.72</td>
|
332 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">74.31</td>
|
333 |
-
|
334 |
</tr>
|
335 |
-
|
336 |
-
<tr>
|
337 |
-
<td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.2-2B-Instruct</b></td>
|
338 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">24.86</td>
|
339 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">34.51</td>
|
340 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">57.18</td>
|
341 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">20.56</td>
|
342 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">59.8</td>
|
343 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">52.27</td>
|
344 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">21.12</td>
|
345 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">67.02</td>
|
346 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">80.13</td>
|
347 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">73.39</td>
|
348 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">61.55</td>
|
349 |
-
</tr>
|
350 |
-
|
351 |
<tr>
|
352 |
<td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.3-8B-Instruct</b></td>
|
353 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 57.56 </td>
|
@@ -361,22 +381,8 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
361 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 89.73 </td>
|
362 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 86.09 </td>
|
363 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 74.82 </td>
|
364 |
-
|
365 |
-
|
366 |
-
<tr>
|
367 |
-
<td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.3-2B-Instruct</b></td>
|
368 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 28.86 </td>
|
369 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 43.45 </td>
|
370 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 55.88 </td>
|
371 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 18.4 </td>
|
372 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 58.97 </td>
|
373 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 5.41 </td>
|
374 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 35.98 </td>
|
375 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 72.48 </td>
|
376 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 80.51 </td>
|
377 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 75.68 </td>
|
378 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 65.8 </td>
|
379 |
-
</tr>
|
380 |
</tbody></table>
|
381 |
|
382 |
<table>
|
@@ -388,37 +394,127 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
388 |
<th style="text-align:center; background-color: #001d6c; color: white;">MATH500</th>
|
389 |
</tr></thead>
|
390 |
<tbody>
|
391 |
-
<tr>
|
392 |
-
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-8B-Instruct</td>
|
393 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 1.97 </td>
|
394 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 48.73 </td>
|
395 |
-
</tr>
|
396 |
<tr>
|
397 |
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-2B-Instruct</td>
|
398 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 0.89 </td>
|
399 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 35.07 </td>
|
400 |
</tr>
|
401 |
-
<tr>
|
402 |
-
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-8B-Instruct</td>
|
403 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 2.43 </td>
|
404 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 52.8 </td>
|
405 |
-
</tr>
|
406 |
<tr>
|
407 |
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-2B-Instruct</td>
|
408 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 0.89 </td>
|
409 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 35.54 </td>
|
410 |
</tr>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
411 |
<tr>
|
412 |
<td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.3-8B-Instruct</b></td>
|
413 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 8.12 </td>
|
414 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 69.02 </td>
|
415 |
</tr>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
416 |
<tr>
|
417 |
-
|
418 |
-
<
|
419 |
-
<
|
420 |
</tr>
|
421 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
422 |
|
423 |
**Training Data:**
|
424 |
Overall, our training data is largely comprised of two key sources: (1) publicly available datasets with permissive license, (2) internal synthetically generated data targeted to enhance reasoning capabilites.
|
|
|
13 |
# Granite-3.3-2B-Instruct
|
14 |
|
15 |
**Model Summary:**
|
16 |
+
Granite-3.3-2B-Instruct is a 2-billion parameter 128K context length language model fine-tuned for improved reasoning and instruction-following capabilities. Built on top of Granite-3.3-2B-Base, the model delivers significant gains on benchmarks for measuring generic performance including AlpacaEval-2.0 and Arena-Hard, and improvements in mathematics, coding, and instruction following. It also supports Fill-in-the-Middle (FIM) for code completion tasks and structured reasoning through \<think\>\<\/think\> and \<response\>\<\/response\> tags, providing clear separation between internal thoughts and final outputs. The model has been trained on a carefully balanced combination of permissively licensed data and curated synthetic tasks.
|
17 |
|
18 |
|
19 |
- **Developers:** Granite Team, IBM
|
20 |
+
- **GitHub Repository:** [ibm-granite/granite-3.3-language-models](https://github.com/ibm-granite/granite-3.3-language-models)
|
21 |
- **Website**: [Granite Docs](https://www.ibm.com/granite/docs/)
|
22 |
+
- **Release Date**: April 16th, 2025
|
23 |
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
24 |
|
25 |
**Supported Languages:**
|
|
|
38 |
* Code related tasks
|
39 |
* Function-calling tasks
|
40 |
* Multilingual dialog use cases
|
41 |
+
* **Fill-in-the-middle**
|
42 |
* Long-context tasks including long document/meeting summarization, long document QA, etc.
|
43 |
|
44 |
|
|
|
84 |
print(prediction)
|
85 |
```
|
86 |
|
87 |
+
**Example Outputs**
|
88 |
**Example Outputs**
|
89 |
- thinking=True
|
90 |
```md
|
|
|
208 |
<table>
|
209 |
|
210 |
<thead>
|
211 |
+
<caption style="text-align:center"><b>Comparison with different models over various benchmarks. Scores of AlpacaEval-2.0 and Arena-Hard are calculated with thinking=True</b></caption>
|
212 |
<tr>
|
213 |
<th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
|
214 |
<th style="text-align:center; background-color: #001d6c; color: white;">ArenaHard</th>
|
215 |
+
<th style="text-align:center; background-color: #001d6c; color: white;">AlpacaEval-2.0</th>
|
216 |
<th style="text-align:center; background-color: #001d6c; color: white;">MMLU</th>
|
217 |
<th style="text-align:center; background-color: #001d6c; color: white;">PopQA</th>
|
218 |
<th style="text-align:center; background-color: #001d6c; color: white;">TruthfulQA</th>
|
|
|
222 |
<th style="text-align:center; background-color: #001d6c; color: white;">HumanEval</th>
|
223 |
<th style="text-align:center; background-color: #001d6c; color: white;">HumanEval+</th>
|
224 |
<th style="text-align:center; background-color: #001d6c; color: white;">IFEval</th>
|
225 |
+
<th style="text-align:center; background-color: #001d6c; color: white;">Attaq</th>
|
226 |
</tr></thead>
|
227 |
<tbody>
|
228 |
+
<tr>
|
229 |
+
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-2B-Instruct</td>
|
230 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">23.3</td>
|
231 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">27.17</td>
|
232 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">57.11</td>
|
233 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">20.55</td>
|
234 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">59.79</td>
|
235 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">54.46</td>
|
236 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">18.68</td>
|
237 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">67.55</td>
|
238 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">79.45</td>
|
239 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">75.26</td>
|
240 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">63.59</td>
|
241 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">84.7</td>
|
242 |
+
</tr>
|
243 |
+
<tr>
|
244 |
+
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-2B-Instruct</td>
|
245 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">24.86</td>
|
246 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">34.51</td>
|
247 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">57.18</td>
|
248 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">20.56</td>
|
249 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">59.8</td>
|
250 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">52.27</td>
|
251 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">21.12</td>
|
252 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">67.02</td>
|
253 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">80.13</td>
|
254 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">73.39</td>
|
255 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">61.55</td>
|
256 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">83.23</td>
|
257 |
+
</tr>
|
258 |
+
<tr>
|
259 |
+
<td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.3-2B-Instruct</b></td>
|
260 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 28.86 </td>
|
261 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 43.45 </td>
|
262 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 55.88 </td>
|
263 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 18.4 </td>
|
264 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 58.97 </td>
|
265 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 5.41 </td>
|
266 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 35.98 </td>
|
267 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 72.48 </td>
|
268 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 80.51 </td>
|
269 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 75.68 </td>
|
270 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 65.8 </td>
|
271 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">87.47</td>
|
272 |
+
</tr>
|
273 |
+
|
274 |
<tr>
|
275 |
<td style="text-align:left; background-color: #DAE8FF; color: black;">Llama-3.1-8B-Instruct</td>
|
276 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">36.43</td>
|
|
|
284 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">85.32</td>
|
285 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">80.15</td>
|
286 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">79.10</td>
|
287 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">83.43</td>
|
288 |
|
289 |
</tr>
|
290 |
|
|
|
301 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">67.54</td>
|
302 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">62.91</td>
|
303 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">66.50</td>
|
304 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">42.87</td>
|
305 |
</tr>
|
306 |
|
307 |
<tr>
|
|
|
317 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">93.35</td>
|
318 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">89.91</td>
|
319 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">74.90</td>
|
320 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">81.90</td>
|
321 |
</tr>
|
322 |
|
323 |
<tr>
|
|
|
333 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">79.89</td>
|
334 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">78.43</td>
|
335 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">59.10</td>
|
336 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">42.45</td>
|
337 |
</tr>
|
338 |
|
339 |
<tr>
|
|
|
349 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">89.63</td>
|
350 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">85.79</td>
|
351 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">73.20</td>
|
352 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">85.73</td>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
353 |
</tr>
|
354 |
+
|
355 |
|
356 |
<tr>
|
357 |
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-8B-Instruct</td>
|
|
|
366 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">89.35</td>
|
367 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">85.72</td>
|
368 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">74.31</td>
|
369 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">84.7</td>
|
370 |
</tr>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
371 |
<tr>
|
372 |
<td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.3-8B-Instruct</b></td>
|
373 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 57.56 </td>
|
|
|
381 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 89.73 </td>
|
382 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 86.09 </td>
|
383 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 74.82 </td>
|
384 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">88.5</td>
|
385 |
+
</tr>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
386 |
</tbody></table>
|
387 |
|
388 |
<table>
|
|
|
394 |
<th style="text-align:center; background-color: #001d6c; color: white;">MATH500</th>
|
395 |
</tr></thead>
|
396 |
<tbody>
|
|
|
|
|
|
|
|
|
|
|
397 |
<tr>
|
398 |
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-2B-Instruct</td>
|
399 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 0.89 </td>
|
400 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 35.07 </td>
|
401 |
</tr>
|
|
|
|
|
|
|
|
|
|
|
402 |
<tr>
|
403 |
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-2B-Instruct</td>
|
404 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 0.89 </td>
|
405 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 35.54 </td>
|
406 |
</tr>
|
407 |
+
<tr>
|
408 |
+
<td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.3-2B-Instruct</b></td>
|
409 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 3.28 </td>
|
410 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 58.09 </td>
|
411 |
+
</tr>
|
412 |
+
<tr>
|
413 |
+
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-8B-Instruct</td>
|
414 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 1.97 </td>
|
415 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 48.73 </td>
|
416 |
+
</tr>
|
417 |
+
<tr>
|
418 |
+
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-8B-Instruct</td>
|
419 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 2.43 </td>
|
420 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 52.8 </td>
|
421 |
+
</tr>
|
422 |
<tr>
|
423 |
<td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.3-8B-Instruct</b></td>
|
424 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 8.12 </td>
|
425 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 69.02 </td>
|
426 |
</tr>
|
427 |
+
</tbody></table>
|
428 |
+
|
429 |
+
</tbody></table>
|
430 |
+
|
431 |
+
<table>
|
432 |
+
<caption><b>Thinking Ablation</b></caption>
|
433 |
+
<thead>
|
434 |
<tr>
|
435 |
+
<th rowspan="4" style="text-align:left; background-color: #001d6c; color: white;">Models</th>
|
436 |
+
<th colspan="4" style="text-align:center; background-color: #001d6c; color: white;">Thinking=False</th>
|
437 |
+
<th colspan="4" style="text-align:center; background-color: #001d6c; color: white;">Thinking=True</th>
|
438 |
</tr>
|
439 |
+
<tr>
|
440 |
+
<th style="text-align:center; background-color: #001d6c; color: white;">ArenaHard</th>
|
441 |
+
<th style="text-align:center; background-color: #001d6c; color: white;">Alpaca-Eval-2</th>
|
442 |
+
<th style="text-align:center; background-color: #001d6c; color: white;">AIME24</th>
|
443 |
+
<th style="text-align:center; background-color: #001d6c; color: white;">MATH500</th>
|
444 |
+
<th style="text-align:center; background-color: #001d6c; color: white;">ArenaHard</th>
|
445 |
+
<th style="text-align:center; background-color: #001d6c; color: white;">Alpaca-Eval-2</th>
|
446 |
+
<th style="text-align:center; background-color: #001d6c; color: white;">AIME24</th>
|
447 |
+
<th style="text-align:center; background-color: #001d6c; color: white;">MATH500</th>
|
448 |
+
</tr></thead>
|
449 |
+
<tbody>
|
450 |
+
<tr>
|
451 |
+
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-2B-Instruct</td>
|
452 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">23.3</td>
|
453 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">27.17</td>
|
454 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">0.89</td>
|
455 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">35.07</td>
|
456 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
|
457 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
|
458 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
|
459 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
|
460 |
+
</tr>
|
461 |
+
<tr>
|
462 |
+
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-2B-Instruct</td>
|
463 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">30.42</td>
|
464 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">31.65</td>
|
465 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">0.94</td>
|
466 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">37.15</td>
|
467 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">26.6</td>
|
468 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">34.51</td>
|
469 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">0.89</td>
|
470 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">35.54</td>
|
471 |
+
</tr>
|
472 |
+
<tr>
|
473 |
+
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.3-2B-Instruct</td>
|
474 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> - </td>
|
475 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> - </td>
|
476 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
|
477 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
|
478 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 28.86 </td>
|
479 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 43.45 </td>
|
480 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">3.28</td>
|
481 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">58.09</td>
|
482 |
+
</tr>
|
483 |
+
<tr>
|
484 |
+
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-8B-Instruct</td>
|
485 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">37.58</td>
|
486 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">30.34</td>
|
487 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">1.97</td>
|
488 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">48.73</td>
|
489 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
|
490 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
|
491 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
|
492 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
|
493 |
+
</tr>
|
494 |
+
<tr>
|
495 |
+
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-8B-Instruct</td>
|
496 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">40.54</td>
|
497 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">36.89</td>
|
498 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">3.13</td>
|
499 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">50.78</td>
|
500 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">55.25</td>
|
501 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">61.19</td>
|
502 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">2.43</td>
|
503 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">52.8</td>
|
504 |
+
</tr>
|
505 |
+
<tr>
|
506 |
+
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.3-8B-Instruct</td>
|
507 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> - </td>
|
508 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> - </td>
|
509 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
|
510 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
|
511 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 57.56 </td>
|
512 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 62.68 </td>
|
513 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">8.12</td>
|
514 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">69.02</td>
|
515 |
+
</tr>
|
516 |
+
</table>
|
517 |
+
<tbody>
|
518 |
|
519 |
**Training Data:**
|
520 |
Overall, our training data is largely comprised of two key sources: (1) publicly available datasets with permissive license, (2) internal synthetically generated data targeted to enhance reasoning capabilites.
|