inferencerlabs commited on
Commit
8de437b
·
verified ·
1 Parent(s): f4d6f34

Upload complete model

Browse files
Files changed (1) hide show
  1. README.md +17 -0
README.md CHANGED
@@ -7,3 +7,20 @@ tags:
7
  - mlx
8
  base_model: openai/gpt-oss-20b
9
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  - mlx
8
  base_model: openai/gpt-oss-20b
9
  ---
10
+ **See gpt-oss-20b 6.5bit MLX in action - [demonstration video](https://youtube.com/xcreate)**
11
+
12
+ *q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8 perplexity (1.128).*
13
+ | Quantization | Perplexity |
14
+ |:------------:|:----------:|
15
+ | **q2** | 41.293 |
16
+ | **q3** | 1.900 |
17
+ | **q4** | 1.168 |
18
+ | **q6** | 1.128 |
19
+ | **q8** | 1.128 |
20
+
21
+ ## Usage Notes
22
+
23
+ * Built with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.26
24
+ * Peak memory usage: ~17 GB
25
+ * Expect ~100 tokens/s
26
+ * For more details see [demonstration video](https://youtube.com/xcreate) or visit [Open AI gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b).