Update README.md
Browse files
README.md
CHANGED
|
@@ -73,7 +73,9 @@ GPT-4All Benchmark Set
|
|
| 73 |
|piqa | 0|acc |0.7922|± |0.0095|
|
| 74 |
| | |acc_norm|0.8112|± |0.0091|
|
| 75 |
|winogrande | 0|acc |0.7293|± |0.0125|
|
| 76 |
-
|
|
|
|
|
|
|
| 77 |
AGI-Eval
|
| 78 |
```
|
| 79 |
| Task |Version| Metric |Value | |Stderr|
|
|
@@ -94,6 +96,7 @@ AGI-Eval
|
|
| 94 |
| | |acc_norm|0.4029|± |0.0343|
|
| 95 |
|agieval_sat_math | 0|acc |0.3273|± |0.0317|
|
| 96 |
| | |acc_norm|0.2636|± |0.0298|
|
|
|
|
| 97 |
```
|
| 98 |
BigBench Reasoning Test
|
| 99 |
```
|
|
@@ -118,6 +121,7 @@ BigBench Reasoning Test
|
|
| 118 |
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.2048|± |0.0114|
|
| 119 |
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1297|± |0.0080|
|
| 120 |
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.4500|± |0.0288|
|
|
|
|
| 121 |
```
|
| 122 |
|
| 123 |
This is a slight improvement on GPT4ALL Suite and BigBench Suite, with a degredation in AGIEval compared to the original hermes.
|
|
|
|
| 73 |
|piqa | 0|acc |0.7922|± |0.0095|
|
| 74 |
| | |acc_norm|0.8112|± |0.0091|
|
| 75 |
|winogrande | 0|acc |0.7293|± |0.0125|
|
| 76 |
+
Average: 0.7036
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
AGI-Eval
|
| 80 |
```
|
| 81 |
| Task |Version| Metric |Value | |Stderr|
|
|
|
|
| 96 |
| | |acc_norm|0.4029|± |0.0343|
|
| 97 |
|agieval_sat_math | 0|acc |0.3273|± |0.0317|
|
| 98 |
| | |acc_norm|0.2636|± |0.0298|
|
| 99 |
+
Average: 0.3556
|
| 100 |
```
|
| 101 |
BigBench Reasoning Test
|
| 102 |
```
|
|
|
|
| 121 |
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.2048|± |0.0114|
|
| 122 |
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1297|± |0.0080|
|
| 123 |
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.4500|± |0.0288|
|
| 124 |
+
Average: 36.75
|
| 125 |
```
|
| 126 |
|
| 127 |
This is a slight improvement on GPT4ALL Suite and BigBench Suite, with a degredation in AGIEval compared to the original hermes.
|