Text Generation
GGUF
conversational

MXFP4_MOE

#1
by marcelone - opened

Please, add MXFP4_MOE.

Jinx org

As far as I know, modified GPT-OSS is unable to support MXFP4 anymore. If you know of any solution for supporting MXFP4, please let us know.

Best,
Jinx Team.

As far as I know, modified GPT-OSS is unable to support MXFP4 anymore. If you know of any solution for supporting MXFP4, please let us know.

Best,
Jinx Team.

Hi! it exists, someone made it and I downloaded it, but then deleted it due to lack of space.

as a result, I also decided to focus on it, it was optimal. but she was no longer on huggingface, she was probably deleted.

image.png

Your jinx versions are super! They support thinking in Russian! It helps a lot in the work!
The original GPT-OSS one thinks only in English

Thanks to the JINX team!

image.png

Jinx org

Thanks for your information. I find it does support MXFP4_MOE. And I converted a MXFP4_MOE version. Then I manually tested some queries. The safety filter comes back from nowhere, and I don't know why. Thus I skip to make MXFP4_MOE for now. Maybe you can try it by yourself.
https://github.com/ggml-org/llama.cpp/blob/618575c5825d7d4f170e686e772178d2aae148ae/tools/quantize/quantize.cpp#L25

Best,
Jinx Team

Thanks for your information. I find it does support MXFP4_MOE. And I converted a MXFP4_MOE version. Then I manually tested some queries. The safety filter comes back from nowhere, and I don't know why. Thus I skip to make MXFP4_MOE for now. Maybe you can try it by yourself.
https://github.com/ggml-org/llama.cpp/blob/618575c5825d7d4f170e686e772178d2aae148ae/tools/quantize/quantize.cpp#L25

Best,
Jinx Team

try add --tensor-type ".*ffn_gate_exps.weight=Q8_0 or Q4_1

It is notable that "safety" filter coming back with MXFP4 is something that was mentioned in another uncensored model: https://huggingface.co/huizimao/gpt-oss-20b-uncensored-mxfp4 (they have it mentioned in the model card, refusal rate increases by many times if MXFP4 is used). So this may be a general problem of GPT-OSS that is not specific to Jinx. My guess it may be because the model was fine-tuned in BF16, and after quantizing to MXFP4 some part of fine-tuning gets lost due to rounding errors, perhaps because MXFP4 quant for best results requires further direct fine-tuning of its MXFP4 weights after quantization.

In any case, thanks to Jinx for sharing a great uncensored model, I think this is the first uncensored GPT-OSS model that preserved intelligence properly. Would be great if 120B version made too if possible (even with just standard quantization, it still would be awesome).

Thanks for your information. I find it does support MXFP4_MOE. And I converted a MXFP4_MOE version. Then I manually tested some queries. The safety filter comes back from nowhere, and I don't know why. Thus I skip to make MXFP4_MOE for now. Maybe you can try it by yourself.
https://github.com/ggml-org/llama.cpp/blob/618575c5825d7d4f170e686e772178d2aae148ae/tools/quantize/quantize.cpp#L25

Best,
Jinx Team

Try upcasting the model weights to F32 (convert the model to F32 instead of BF16), and then quantize to GGUF MXFP4_MOE. 😋

https://huggingface.co/Joseph717171/Jinx-gpt-OSS-20B-GGUF

Thanks Jinx team for your wonderful work on GPT-OSS-20B. I upcasted the model weights to F32, quantized to GGUF MXFP4_MOE.

Sign up or log in to comment