Smashed 💪 Scored to 82.86 🔥2bit IQ2_M on MMLU Pro single shot benchmark
#7
by
xbruce22
- opened
Earlier the same model scored 72.86, How I improved?
Few questions in MMLU Pro bench for GLM 4.5 Air took more than 15000 tokens to answer with 25min time.
So I increased max output tokens to 32k and timeout for API server to 1hr so that our bro has enough time to think 🤣
Highly underrated model. Tool calling (instruction following one) is also decent. (better than gpt-oss 120B)
logs
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| Model | Dataset | Metric | Subset | Num | Score | Cat.0 |
+===========================+===========+=================+==================+=======+=========+=========+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro | AverageAccuracy | computer science | 10 | 0.8 | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro | AverageAccuracy | math | 10 | 0.9 | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro | AverageAccuracy | chemistry | 10 | 0.8 | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro | AverageAccuracy | engineering | 10 | 0.9 | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro | AverageAccuracy | law | 10 | 0.5 | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro | AverageAccuracy | biology | 10 | 0.9 | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro | AverageAccuracy | health | 10 | 0.9 | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro | AverageAccuracy | physics | 10 | 1 | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro | AverageAccuracy | business | 10 | 0.8 | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro | AverageAccuracy | philosophy | 10 | 0.9 | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro | AverageAccuracy | economics | 10 | 0.9 | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro | AverageAccuracy | other | 10 | 0.8 | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro | AverageAccuracy | psychology | 10 | 0.8 | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro | AverageAccuracy | history | 10 | 0.7 | default |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
| GLM-4.5-Air-UD-IQ2_M.gguf | mmlu_pro | AverageAccuracy | OVERALL | 140 | 0.8286 | - |
+---------------------------+-----------+-----------------+------------------+-------+---------+---------+
Hello, I would like to know which of the IQ2_KL model located at ubergarm/GLM-4.5-Air-GGUF, and the IQ2_M and Q2_K_XL models here, would be better. Thank you.
I have used unsloth's IQ2_M gguf (size: 44.3GB)