Update README.md
Browse files
README.md
CHANGED
|
@@ -159,21 +159,7 @@ All evaluations are obtained through [lm-evaluation-harness](https://github.com/
|
|
| 159 |
| GPQA<br>0-shot | 102.6 | 31.88 | 32.72 |
|
| 160 |
| MuSR<br>0-shot | 101.2 | 42.20 | 42.72 |
|
| 161 |
| MMLU-Pro<br>5-shot | 99.12 | 55.70 | 55.21 |
|
| 162 |
-
| **OpenLLM v2<br>Average Score** | **100.48** | **56.60** | **56.87** |
|
| 163 |
-
| RULER<br>seqlen = 131072<br>niah_multikey_1 | ? | 88.20 | ? |
|
| 164 |
-
| RULER<br>seqlen = 131072<br>niah_multikey_2 | ? | 83.60 | ? |
|
| 165 |
-
| RULER<br>seqlen = 131072<br>niah_multikey_3 | ? | 78.80 | ? |
|
| 166 |
-
| RULER<br>seqlen = 131072<br>niah_multiquery | ? | 95.40 | ? |
|
| 167 |
-
| RULER<br>seqlen = 131072<br>niah_multivalue | ? | 73.75 | ? |
|
| 168 |
-
| RULER<br>seqlen = 131072<br>niah_single_1 | ? | 100.00 | ? |
|
| 169 |
-
| RULER<br>seqlen = 131072<br>niah_single_2 | ? | 99.80 | ? |
|
| 170 |
-
| RULER<br>seqlen = 131072<br>niah_single_3 | ? | 99.80 | ? |
|
| 171 |
-
| RULER<br>seqlen = 131072<br>ruler_cwe | ? | 39.42 | ? |
|
| 172 |
-
| RULER<br>seqlen = 131072<br>ruler_fwe | ? | 92.93 | ? |
|
| 173 |
-
| RULER<br>seqlen = 131072<br>ruler_qa_hotpot | ? | 48.20 | ? |
|
| 174 |
-
| RULER<br>seqlen = 131072<br>ruler_qa_squad | ? | 53.57 | ? |
|
| 175 |
-
| RULER<br>seqlen = 131072<br>ruler_qa_vt | ? | 92.28 | ? |
|
| 176 |
-
| **RULER<br>seqlen = 131072<br>Average Score** | **?** | **80.44** | **?** |
|
| 177 |
| MMMU<br>0-shot | 101.6 | 53.44 | 54.33 |
|
| 178 |
| ChartQA<br>0-shot<br>exact_match | 100.8 | 65.88 | 66.44 |
|
| 179 |
| ChartQA<br>0-shot<br>relaxed_accuracy | 99.82 | 88.92 | 88.76 |
|
|
|
|
| 159 |
| GPQA<br>0-shot | 102.6 | 31.88 | 32.72 |
|
| 160 |
| MuSR<br>0-shot | 101.2 | 42.20 | 42.72 |
|
| 161 |
| MMLU-Pro<br>5-shot | 99.12 | 55.70 | 55.21 |
|
| 162 |
+
| **OpenLLM v2<br>Average Score** | **100.48** | **56.60** | **56.87** | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 163 |
| MMMU<br>0-shot | 101.6 | 53.44 | 54.33 |
|
| 164 |
| ChartQA<br>0-shot<br>exact_match | 100.8 | 65.88 | 66.44 |
|
| 165 |
| ChartQA<br>0-shot<br>relaxed_accuracy | 99.82 | 88.92 | 88.76 |
|