Update README.md
Browse files
README.md
CHANGED
|
@@ -119,16 +119,29 @@ lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-float8dq
|
|
| 119 |
`TODO: more complete eval results`
|
| 120 |
|
| 121 |
|
| 122 |
-
| Benchmark |
|
| 123 |
-
|
| 124 |
-
| | Phi-4 mini-Ins | phi4-mini-
|
| 125 |
-
| **Popular aggregated benchmark** |
|
| 126 |
-
|
|
| 127 |
-
|
|
| 128 |
-
| **
|
| 129 |
-
|
|
| 130 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 131 |
|
|
|
|
| 132 |
# Model Performance
|
| 133 |
|
| 134 |
## Results (H100 machine)
|
|
|
|
| 119 |
`TODO: more complete eval results`
|
| 120 |
|
| 121 |
|
| 122 |
+
| Benchmark | | |
|
| 123 |
+
|----------------------------------|----------------|---------------------|
|
| 124 |
+
| | Phi-4 mini-Ins | phi4-mini-int4wo |
|
| 125 |
+
| **Popular aggregated benchmark** | | |
|
| 126 |
+
| mmlu (0-shot) | | x |
|
| 127 |
+
| mmlu_pro (5-shot) | | x |
|
| 128 |
+
| **Reasoning** | | |
|
| 129 |
+
| arc_challenge (0-shot) | | x |
|
| 130 |
+
| gpqa_main_zeroshot | | x |
|
| 131 |
+
| HellaSwag | 54.57 | 54.55 |
|
| 132 |
+
| openbookqa | | x |
|
| 133 |
+
| piqa (0-shot) | | x |
|
| 134 |
+
| social_iqa | | x |
|
| 135 |
+
| truthfulqa_mc2 (0-shot) | | x |
|
| 136 |
+
| winogrande (0-shot) | | x |
|
| 137 |
+
| **Multilingual** | | |
|
| 138 |
+
| mgsm_en_cot_en | | x |
|
| 139 |
+
| **Math** | | |
|
| 140 |
+
| gsm8k (5-shot) | | x |
|
| 141 |
+
| mathqa (0-shot) | | x |
|
| 142 |
+
| **Overall** | **TODO** | **TODO** |
|
| 143 |
|
| 144 |
+
|
| 145 |
# Model Performance
|
| 146 |
|
| 147 |
## Results (H100 machine)
|