unsloth/gpt-oss-20b-GGUF · Absurd sizes.

19 days ago

It's absurd that different quantizations give out the same 11 GB size.
I don't see the advantage.
Also:
the quantized versions don't work well in llama.cpp

shimmyshimmer

Unsloth AI org 18 days ago

For quantizing, llama.cpp has limitations atm and I think they're working on fixing it. Then we can make proper quants for it with many different sizes :)

Could you explain what you mean they dont work well? Accuracy, speed?

phi1

18 days ago

•

edited 18 days ago

(Note: Wrote this before I realised the more detailed discussion around the same points in https://huggingface.co/unsloth/gpt-oss-20b-GGUF/discussions/2). Besides, the sizes for fixed bit-width quantisations don't add up: A 20B model at 16 bits should be around 40 GB, and at 8 bits at least 20 GB. Edit: Just read in the other thread that it seems to be generated from an FP4 original. While the size calculations still apply, they could be completely irrelevant if there is not any more information than 4 bits per parameter anyway (and it is not obvious to me how any quant above 4 bit could make any sense (at least not information-wise - but maybe for utilizing specific hardware optimizations).

ZeroWw

16 days ago

yes. I understand... but Q2 size should be almost half of the Q4 size. (for example)