Smashed 💪 Scored to 82.86 🔥2bit IQ2_M on MMLU Pro single shot benchmark

#7
by xbruce22 - opened

Earlier the same model scored 72.86, How I improved?
Few questions in MMLU Pro bench for GLM 4.5 Air took more than 15000 tokens to answer with 25min time.
So I increased max output tokens to 32k and timeout for API server to 1hr so that our bro has enough time to think 🤣

Highly underrated model. Tool calling (instruction following one) is also decent. (better than gpt-oss 120B)

logs

+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Model                     | Dataset   | Metric          | Subset           |   Num |   Score | Cat.0   |
+===========================+===========+=================+==================+=======+=========+=========+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro  | AverageAccuracy | computer science |    10 |  0.8    | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro  | AverageAccuracy | math             |    10 |  0.9    | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro  | AverageAccuracy | chemistry        |    10 |  0.8    | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro  | AverageAccuracy | engineering      |    10 |  0.9    | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro  | AverageAccuracy | law              |    10 |  0.5    | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro  | AverageAccuracy | biology          |    10 |  0.9    | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro  | AverageAccuracy | health           |    10 |  0.9    | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro  | AverageAccuracy | physics          |    10 |  1      | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro  | AverageAccuracy | business         |    10 |  0.8    | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro  | AverageAccuracy | philosophy       |    10 |  0.9    | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro  | AverageAccuracy | economics        |    10 |  0.9    | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro  | AverageAccuracy | other            |    10 |  0.8    | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro  | AverageAccuracy | psychology       |    10 |  0.8    | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro  | AverageAccuracy | history          |    10 |  0.7    | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro  | AverageAccuracy | OVERALL          |   140 |  0.8286 | -       |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+

Hello, I would like to know which of the IQ2_KL model located at ubergarm/GLM-4.5-Air-GGUF, and the IQ2_M and Q2_K_XL models here, would be better. Thank you.

I have used unsloth's IQ2_M gguf (size: 44.3GB)

Sign up or log in to comment