alexmarques commited on
Commit
06c5c33
·
verified ·
1 Parent(s): 6eefe43

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +136 -0
README.md CHANGED
@@ -223,4 +223,140 @@ The model was evaluated on the OpenLLM leaderboard tasks (version 1), using [lm-
223
  <td><strong>100.1%</strong>
224
  </td>
225
  </tr>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
226
  </table>
 
223
  <td><strong>100.1%</strong>
224
  </td>
225
  </tr>
226
+ <tr>
227
+ <td rowspan="7" ><strong>OpenLLM v2</strong>
228
+ </td>
229
+ <td>MMLU-Pro (5-shot)
230
+ </td>
231
+ <td>17.25
232
+ </td>
233
+ <td>18.31
234
+ </td>
235
+ <td>---
236
+ </td>
237
+ </tr>
238
+ <tr>
239
+ <td>IFEval (0-shot)
240
+ </td>
241
+ <td>62.83
242
+ </td>
243
+ <td>60.07
244
+ </td>
245
+ <td>95.6%
246
+ </td>
247
+ </tr>
248
+ <tr>
249
+ <td>BBH (3-shot)
250
+ </td>
251
+ <td>4.23
252
+ </td>
253
+ <td>2.72
254
+ </td>
255
+ <td>---
256
+ </td>
257
+ </tr>
258
+ <tr>
259
+ <td>Math-lvl-5 (4-shot)
260
+ </td>
261
+ <td>18.26
262
+ </td>
263
+ <td>14.63
264
+ </td>
265
+ <td>---
266
+ </td>
267
+ </tr>
268
+ <tr>
269
+ <td>GPQA (0-shot)
270
+ </td>
271
+ <td>0.00
272
+ </td>
273
+ <td>0.00
274
+ </td>
275
+ <td>---
276
+ </td>
277
+ </tr>
278
+ <tr>
279
+ <td>MuSR (0-shot)
280
+ </td>
281
+ <td>0.00
282
+ </td>
283
+ <td>0.00
284
+ </td>
285
+ <td>---
286
+ </td>
287
+ </tr>
288
+ <tr>
289
+ <td><strong>Average</strong>
290
+ </td>
291
+ <td><strong>17.10</strong>
292
+ </td>
293
+ <td><strong>15.96</strong>
294
+ </td>
295
+ <td><strong>---</strong>
296
+ </td>
297
+ </tr>
298
+ <tr>
299
+ <td><strong>Multilingual</strong>
300
+ </td>
301
+ <td>MGSM (0-shot)
302
+ </td>
303
+ <td>19.70
304
+ </td>
305
+ <td>19.90
306
+ </td>
307
+ <td>---
308
+ </td>
309
+ </tr>
310
+ <tr>
311
+ <td rowspan="6" ><strong>Reasoning<br>(generation)</strong>
312
+ </td>
313
+ <td>AIME 2024
314
+ </td>
315
+ <td>9.69
316
+ </td>
317
+ <td>9.58
318
+ </td>
319
+ <td>---
320
+ </td>
321
+ </tr>
322
+ <tr>
323
+ <td>AIME 2025
324
+ </td>
325
+ <td>13.13
326
+ </td>
327
+ <td>12.92
328
+ </td>
329
+ <td>---
330
+ </td>
331
+ </tr>
332
+ <tr>
333
+ <td>GPQA diamond
334
+ </td>
335
+ <td>29.29
336
+ </td>
337
+ <td>25.76
338
+ </td>
339
+ <td>88.0%
340
+ </td>
341
+ </tr>
342
+ <tr>
343
+ <td>Math-lvl-5
344
+ </td>
345
+ <td>71.60
346
+ </td>
347
+ <td>70.60
348
+ </td>
349
+ <td>98.6%
350
+ </td>
351
+ </tr>
352
+ <tr>
353
+ <td>LiveCodeBench
354
+ </td>
355
+ <td>12.83
356
+ </td>
357
+ <td>13.11
358
+ </td>
359
+ <td>---
360
+ </td>
361
+ </tr>
362
  </table>