File size: 910 Bytes
f2ee58a
 
 
 
 
 
 
 
ec2d085
f2ee58a
6c5d703
f2ee58a
 
 
 
 
 
 
 
 
 
 
8b23818
f2ee58a
7fa1142
8b23818
7fa1142
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
---
license: mit
library_name: mlx
base_model: deepseek-ai/DeepSeek-V3.1
tags:
- mlx
pipeline_tag: text-generation
---
**See DeepSeek-V3.1 5.5bit MLX in action - [demonstration video](https://youtu.be/ufXZI6aqOU8)**

*q5.5bit quant typically achieves 1.141 perplexity in our testing*
| Quantization | Perplexity |
|:------------:|:----------:|
| **q2.5**     | 41.293     |
| **q3.5**     | 1.900      |
| **q4.5**     | 1.168      |
| **q5.5**     | 1.141      |
| **q6.5**     | 1.128      |
| **q8.5**     | 1.128      |

## Usage Notes

* Runs on a single M3 Ultra 512GB RAM using [Inferencer app](https://inferencer.com)
* Memory usage: ~480 GB
* Expect ~13-19 tokens/s
* Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.26
* For more details see [demonstration video](https://youtu.be/ufXZI6aqOU8) or visit [DeepSeek-V3.1](https://huggingface.co/deepseek-ai/DeepSeek-V3.1).