MXFP4_MOE
Please, add MXFP4_MOE.
As far as I know, modified GPT-OSS is unable to support MXFP4 anymore. If you know of any solution for supporting MXFP4, please let us know.
Best,
Jinx Team.
As far as I know, modified GPT-OSS is unable to support MXFP4 anymore. If you know of any solution for supporting MXFP4, please let us know.
Best,
Jinx Team.
Hi! it exists, someone made it and I downloaded it, but then deleted it due to lack of space.
as a result, I also decided to focus on it, it was optimal. but she was no longer on huggingface, she was probably deleted.
Thanks for your information. I find it does support MXFP4_MOE
. And I converted a MXFP4_MOE
version. Then I manually tested some queries. The safety filter comes back from nowhere, and I don't know why. Thus I skip to make MXFP4_MOE
for now. Maybe you can try it by yourself.
https://github.com/ggml-org/llama.cpp/blob/618575c5825d7d4f170e686e772178d2aae148ae/tools/quantize/quantize.cpp#L25
Best,
Jinx Team
Thanks for your information. I find it does support
MXFP4_MOE
. And I converted aMXFP4_MOE
version. Then I manually tested some queries. The safety filter comes back from nowhere, and I don't know why. Thus I skip to makeMXFP4_MOE
for now. Maybe you can try it by yourself.
https://github.com/ggml-org/llama.cpp/blob/618575c5825d7d4f170e686e772178d2aae148ae/tools/quantize/quantize.cpp#L25Best,
Jinx Team
try add --tensor-type ".*ffn_gate_exps.weight=Q8_0 or Q4_1
It is notable that "safety" filter coming back with MXFP4 is something that was mentioned in another uncensored model: https://huggingface.co/huizimao/gpt-oss-20b-uncensored-mxfp4 (they have it mentioned in the model card, refusal rate increases by many times if MXFP4 is used). So this may be a general problem of GPT-OSS that is not specific to Jinx. My guess it may be because the model was fine-tuned in BF16, and after quantizing to MXFP4 some part of fine-tuning gets lost due to rounding errors, perhaps because MXFP4 quant for best results requires further direct fine-tuning of its MXFP4 weights after quantization.
In any case, thanks to Jinx for sharing a great uncensored model, I think this is the first uncensored GPT-OSS model that preserved intelligence properly. Would be great if 120B version made too if possible (even with just standard quantization, it still would be awesome).
Thanks for your information. I find it does support
MXFP4_MOE
. And I converted aMXFP4_MOE
version. Then I manually tested some queries. The safety filter comes back from nowhere, and I don't know why. Thus I skip to makeMXFP4_MOE
for now. Maybe you can try it by yourself.
https://github.com/ggml-org/llama.cpp/blob/618575c5825d7d4f170e686e772178d2aae148ae/tools/quantize/quantize.cpp#L25Best,
Jinx Team
Try upcasting the model weights to F32 (convert the model to F32 instead of BF16), and then quantize to GGUF MXFP4_MOE. 😋
https://huggingface.co/Joseph717171/Jinx-gpt-OSS-20B-GGUF
Thanks Jinx team for your wonderful work on GPT-OSS-20B. I upcasted the model weights to F32, quantized to GGUF MXFP4_MOE.