inferencerlabs
/

openai-gpt-oss-20b-MLX-6.5bit

Text Generation

Model card Files Files and versions

inferencerlabs commited on Aug 6

Commit

8de437b

·

verified ·

1 Parent(s): f4d6f34

Upload complete model

Files changed (1) hide show

README.md +17 -0

README.md CHANGED Viewed

@@ -7,3 +7,20 @@ tags:
 - mlx
 base_model: openai/gpt-oss-20b
 ---

 - mlx
 base_model: openai/gpt-oss-20b
 ---
+**See gpt-oss-20b 6.5bit MLX in action - [demonstration video](https://youtube.com/xcreate)**
+*q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8 perplexity (1.128).*
+| Quantization | Perplexity |
+|:------------:|:----------:|
+| **q2**       | 41.293     |
+| **q3**       | 1.900      |
+| **q4**       | 1.168      |
+| **q6**       | 1.128      |
+| **q8**       | 1.128      |
+## Usage Notes
+* Built with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.26
+* Peak memory usage: ~17 GB
+* Expect ~100 tokens/s
+* For more details see [demonstration video](https://youtube.com/xcreate) or visit [Open AI gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b).