Update README.md

Browse files

Files changed (1) hide show

README.md +22 -9

README.md CHANGED Viewed

@@ -119,16 +119,29 @@ lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-float8dq
 `TODO: more complete eval results`
-| Benchmark                        |             |                   |
-|----------------------------------|-------------|-------------------|
-|                                  | Phi-4 mini-Ins | phi4-mini-float8dq |
-| **Popular aggregated benchmark** |             |                   |
-| **Reasoning**                    |             |                   |
-| HellaSwag                        | 54.57       | 54.55             |
-| **Multilingual**                 |             |                   |
-| **Math**                         |             |                   |
-| **Overall**                      | **TODO**    | **TODO**          |
 # Model Performance
 ## Results (H100 machine)

 `TODO: more complete eval results`
+| Benchmark                        |                |                     |
+|----------------------------------|----------------|---------------------|
+|                                  | Phi-4 mini-Ins | phi4-mini-int4wo    |
+| **Popular aggregated benchmark** |                |                     |
+| mmlu (0-shot)                    |                |  x              |
+| mmlu_pro (5-shot)                |                |  x              |
+| **Reasoning**                    |                |                     |
+| arc_challenge (0-shot)           |                |  x              |
+| gpqa_main_zeroshot               |                |  x              |
+| HellaSwag                        | 54.57          |  54.55              |
+| openbookqa                       |                |  x              |
+| piqa (0-shot)	                   |                |  x              |
+| social_iqa                       |                |  x              |
+| truthfulqa_mc2 (0-shot)          |                |  x              |
+| winogrande  (0-shot)             |                |  x              |
+| **Multilingual**                 |                |                     |
+| mgsm_en_cot_en                   |                |   x              |
+| **Math**                         |                |                     |
+| gsm8k (5-shot)                   |                |   x             |
+| mathqa (0-shot)                  |                |   x             |
+| **Overall**                      | **TODO**       | **TODO**            |
 # Model Performance
 ## Results (H100 machine)