Absurd sizes.
It's absurd that different quantizations give out the same 11 GB size.
I don't see the advantage.
Also:
the quantized versions don't work well in llama.cpp
For quantizing, llama.cpp has limitations atm and I think they're working on fixing it. Then we can make proper quants for it with many different sizes :)
Could you explain what you mean they dont work well? Accuracy, speed?
(Note: Wrote this before I realised the more detailed discussion around the same points in https://huggingface.co/unsloth/gpt-oss-20b-GGUF/discussions/2). Besides, the sizes for fixed bit-width quantisations don't add up: A 20B model at 16 bits should be around 40 GB, and at 8 bits at least 20 GB. Edit: Just read in the other thread that it seems to be generated from an FP4 original. While the size calculations still apply, they could be completely irrelevant if there is not any more information than 4 bits per parameter anyway (and it is not obvious to me how any quant above 4 bit could make any sense (at least not information-wise - but maybe for utilizing specific hardware optimizations).
yes. I understand... but Q2 size should be almost half of the Q4 size. (for example)