pytorch
/

Qwen3-8B-INT4

@@ -196,30 +196,27 @@ lm_eval --model hf --model_args pretrained=microsoft/Phi-4-mini-instruct --tasks
 ## int4 weight only quantization with hqq (int4wo-hqq)
 ```Shell
-lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-int4wo-hqq --tasks hellaswag --device cuda:0 --batch_size 8
 ```
 | Benchmark                        |                |                           |
 |----------------------------------|----------------|---------------------------|
-|                                  | Phi-4-mini-ins | Phi-4-mini-ins-int4wo-hqq |
-| **Popular aggregated benchmark** |                |                           |
-| mmlu (0-shot)                    | 66.73          |  63.56                    |
-| mmlu_pro (5-shot)                | 46.43          |  36.74                    |
-| **Reasoning**                    |                |                           |
-| arc_challenge (0-shot)           | 56.91          |  54.86                    |
-| gpqa_main_zeroshot               | 30.13          |  30.58                    |
-| HellaSwag                        | 54.57          |  53.54                    |
-| openbookqa                       | 33.00          |  34.40                    |
-| piqa (0-shot)	                   | 77.64          |  76.33                    |
-| social_iqa                       | 49.59          |  47.90                    |
-| truthfulqa_mc2 (0-shot)          | 48.39          |  46.44                    |
-| winogrande  (0-shot)             | 71.11          |  71.51                    |
 | **Multilingual**                 |                |                           |
-| mgsm_en_cot_en                   | 60.8           |  59.6                     |
 | **Math**                         |                |                           |
-| gsm8k (5-shot)                   | 81.88          |  74.37                    |
-| mathqa (0-shot)                  | 42.31          |  42.75                    |
-| **Overall**                      | **55.35**      | **53.28**                 |
 # Peak Memory Usage

 ## int4 weight only quantization with hqq (int4wo-hqq)
 ```Shell
+export MODEL=pytorch/Qwen3-8B-int4wo-hqq
+# or
+# export MODEL=Qwen/Qwen3-8B
+lm_eval --model hf --model_args pretrained=$MODEL --tasks hellaswag --device cuda:0 --batch_size 8
 ```
 | Benchmark                        |                |                           |
 |----------------------------------|----------------|---------------------------|
+|                                  | Qwen3-8B       | Qwen3-8B-int4wo           |
+| **General**                      |                |                           |
+| mmlu                             | 73.04          | 70.4                      |
+| mmlu_pro                         | 53.81          | 52.79                     |
+| bbh                              | 79.33          | WIP                       |
 | **Multilingual**                 |                |                           |
+| mgsm_en_cot_en                   | 39.6           | 33.2                      |
+| m_mmlu                           | WIP            | WIP                       |
 | **Math**                         |                |                           |
+| gpqa_main_zeroshot               | 35.71          | 32.14                     |
+| gsm8k                            | 87.79          | 86.28                     |
+| leaderboard_math_hard            | WIP            | WIP                       |
+| **Overall**                      | WIP            | WIP                       |
 # Peak Memory Usage