Update README.md
Browse filescorrection in bigbenchhard score and removal of thinking ablation table.
README.md
CHANGED
|
@@ -262,7 +262,7 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
| 262 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 55.88 </td>
|
| 263 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 18.4 </td>
|
| 264 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 58.97 </td>
|
| 265 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">
|
| 266 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 35.98 </td>
|
| 267 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 72.48 </td>
|
| 268 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 80.51 </td>
|
|
@@ -375,7 +375,7 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
| 375 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 65.54 </td>
|
| 376 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 26.17 </td>
|
| 377 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 66.86 </td>
|
| 378 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">
|
| 379 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 41.53 </td>
|
| 380 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 80.89 </td>
|
| 381 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 89.73 </td>
|
|
@@ -428,7 +428,7 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
| 428 |
|
| 429 |
</tbody></table>
|
| 430 |
|
| 431 |
-
<table>
|
| 432 |
<caption><b>Thinking Ablation</b></caption>
|
| 433 |
<thead>
|
| 434 |
<tr>
|
|
@@ -514,7 +514,7 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
| 514 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">69.02</td>
|
| 515 |
</tr>
|
| 516 |
</table>
|
| 517 |
-
<tbody>
|
| 518 |
|
| 519 |
**Training Data:**
|
| 520 |
Overall, our training data is largely comprised of two key sources: (1) publicly available datasets with permissive license, (2) internal synthetically generated data targeted to enhance reasoning capabilites.
|
|
|
|
| 262 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 55.88 </td>
|
| 263 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 18.4 </td>
|
| 264 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 58.97 </td>
|
| 265 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 52.51 </td>
|
| 266 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 35.98 </td>
|
| 267 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 72.48 </td>
|
| 268 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 80.51 </td>
|
|
|
|
| 375 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 65.54 </td>
|
| 376 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 26.17 </td>
|
| 377 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 66.86 </td>
|
| 378 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 59.01 </td>
|
| 379 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 41.53 </td>
|
| 380 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 80.89 </td>
|
| 381 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 89.73 </td>
|
|
|
|
| 428 |
|
| 429 |
</tbody></table>
|
| 430 |
|
| 431 |
+
<!-- <table>
|
| 432 |
<caption><b>Thinking Ablation</b></caption>
|
| 433 |
<thead>
|
| 434 |
<tr>
|
|
|
|
| 514 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">69.02</td>
|
| 515 |
</tr>
|
| 516 |
</table>
|
| 517 |
+
<tbody> -->
|
| 518 |
|
| 519 |
**Training Data:**
|
| 520 |
Overall, our training data is largely comprised of two key sources: (1) publicly available datasets with permissive license, (2) internal synthetically generated data targeted to enhance reasoning capabilites.
|