| --- |
| license: apache-2.0 |
| pipeline_tag: text-generation |
| library_name: mlx |
| tags: |
| - vllm |
| - mlx |
| base_model: openai/gpt-oss-120b |
| --- |
| **See gpt-oss-120b 6.5bit MLX in action - [demonstration video](https://youtube.com/xcreate)** |
|
|
| *q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8 perplexity (1.128).* |
| | Quantization | Perplexity | |
| |:------------:|:----------:| |
| | **q2** | 41.293 | |
| | **q3** | 1.900 | |
| | **q4** | 1.168 | |
| | **q6** | 1.128 | |
| | **q8** | 1.128 | |
|
|
| ## Usage Notes |
|
|
| * Built with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.26 |
| * Memory usage: ~95 GB |
| * Expect ~60 tokens/s |
| * For more details see [demonstration video](https://youtube.com/xcreate) or visit [Open AI gpt-oss-20b](https://huggingface.co/openai/gpt-oss-120b). |