inferencerlabs
/

openai-gpt-oss-20b-MLX-6.5bit

Text Generation

Model card Files Files and versions

inferencerlabs commited on Aug 6

Commit

55a922a

·

verified ·

1 Parent(s): f566bd4

Upload complete model

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ tags:
 - mlx
 base_model: openai/gpt-oss-20b
 ---
-**See gpt-oss-20b 6.5bit MLX in action - [demonstration video](https://youtube.com/xcreate)**
 *q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8.*
 | Quantization | Perplexity |
@@ -21,6 +21,6 @@ base_model: openai/gpt-oss-20b
 ## Usage Notes
 * Built with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.26
-* Peak memory usage: ~17 GB
 * Expect ~100 tokens/s
-* For more details see [demonstration video](https://youtube.com/xcreate) or visit [OpenAI gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b).

 - mlx
 base_model: openai/gpt-oss-20b
 ---
+**See gpt-oss-20b 6.5bit MLX in action - [demonstration video](https://youtu.be/mlpFG8e_fLw)**
 *q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8.*
 | Quantization | Perplexity |
 ## Usage Notes
 * Built with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.26
+* Memory usage: ~17 GB
 * Expect ~100 tokens/s
+* For more details see [demonstration video](https://youtu.be/mlpFG8e_fLw) or visit [OpenAI gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b).