See gpt-oss-20b 6.5bit MLX in action - demonstration video
q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8.
Quantization | Perplexity |
---|---|
q2 | 41.293 |
q3 | 1.900 |
q4 | 1.168 |
q6 | 1.128 |
q8 | 1.128 |
Usage Notes
- Tested to run with Inferencer app
- Memory usage: ~17 GB (down from ~46GB required by native MXFP4 format)
- Expect ~100 tokens/s
- Quantized with a modified version of MLX 0.26
- For more details see demonstration video or visit OpenAI gpt-oss-20b.
- Downloads last month
- 2,226
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for inferencerlabs/openai-gpt-oss-20b-MLX-6.5bit
Base model
openai/gpt-oss-20b