MXFP4_MOE

by marcelone - opened 8 days ago

Discussion

marcelone

8 days ago

Please, add MXFP4_MOE.

Jeol

Jinx org 7 days ago

As far as I know, modified GPT-OSS is unable to support MXFP4 anymore. If you know of any solution for supporting MXFP4, please let us know.

Best,
Jinx Team.

Maria99934

6 days ago

As far as I know, modified GPT-OSS is unable to support MXFP4 anymore. If you know of any solution for supporting MXFP4, please let us know.

Best,
Jinx Team.

Hi! it exists, someone made it and I downloaded it, but then deleted it due to lack of space.

as a result, I also decided to focus on it, it was optimal. but she was no longer on huggingface, she was probably deleted.

Maria99934

6 days ago

•

edited 6 days ago

Your jinx versions are super! They support thinking in Russian! It helps a lot in the work!
The original GPT-OSS one thinks only in English

Thanks to the JINX team!

Jeol

Jinx org 6 days ago

Thanks for your information. I find it does support MXFP4_MOE. And I converted a MXFP4_MOE version. Then I manually tested some queries. The safety filter comes back from nowhere, and I don't know why. Thus I skip to make MXFP4_MOE for now. Maybe you can try it by yourself.
https://github.com/ggml-org/llama.cpp/blob/618575c5825d7d4f170e686e772178d2aae148ae/tools/quantize/quantize.cpp#L25

Best,
Jinx Team

marcelone

3 days ago

•

edited 3 days ago

Thanks for your information. I find it does support MXFP4_MOE. And I converted a MXFP4_MOE version. Then I manually tested some queries. The safety filter comes back from nowhere, and I don't know why. Thus I skip to make MXFP4_MOE for now. Maybe you can try it by yourself.
https://github.com/ggml-org/llama.cpp/blob/618575c5825d7d4f170e686e772178d2aae148ae/tools/quantize/quantize.cpp#L25

Best,
Jinx Team

try add --tensor-type ".*ffn_gate_exps.weight=Q8_0 or Q4_1

Lissanro

about 13 hours ago

•

edited about 1 hour ago

It is notable that "safety" filter coming back with MXFP4 is something that was mentioned in another uncensored model: https://huggingface.co/huizimao/gpt-oss-20b-uncensored-mxfp4 (they have it mentioned in the model card, refusal rate increases by many times if MXFP4 is used). So this may be a general problem of GPT-OSS that is not specific to Jinx. My guess it may be because the model was fine-tuned in BF16, and after quantizing to MXFP4 some part of fine-tuning gets lost due to rounding errors, perhaps because MXFP4 quant for best results requires further direct fine-tuning of its MXFP4 weights after quantization.

In any case, thanks to Jinx for sharing a great uncensored model, I think this is the first uncensored GPT-OSS model that preserved intelligence properly. Would be great if 120B version made too if possible (even with just standard quantization, it still would be awesome).

Joseph717171

30 minutes ago

•

edited 29 minutes ago

Thanks for your information. I find it does support MXFP4_MOE. And I converted a MXFP4_MOE version. Then I manually tested some queries. The safety filter comes back from nowhere, and I don't know why. Thus I skip to make MXFP4_MOE for now. Maybe you can try it by yourself.
https://github.com/ggml-org/llama.cpp/blob/618575c5825d7d4f170e686e772178d2aae148ae/tools/quantize/quantize.cpp#L25

Best,
Jinx Team

Try upcasting the model weights to F32 (convert the model to F32 instead of BF16), and then quantize to GGUF MXFP4_MOE. 😋

Joseph717171

25 minutes ago

•

edited 23 minutes ago

https://huggingface.co/Joseph717171/Jinx-gpt-OSS-20B-GGUF

Thanks Jinx team for your wonderful work on GPT-OSS-20B. I upcasted the model weights to F32, quantized to GGUF MXFP4_MOE.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment