File size: 910 Bytes
f2ee58a ec2d085 f2ee58a 6c5d703 f2ee58a 8b23818 f2ee58a 7fa1142 8b23818 7fa1142 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
---
license: mit
library_name: mlx
base_model: deepseek-ai/DeepSeek-V3.1
tags:
- mlx
pipeline_tag: text-generation
---
**See DeepSeek-V3.1 5.5bit MLX in action - [demonstration video](https://youtu.be/ufXZI6aqOU8)**
*q5.5bit quant typically achieves 1.141 perplexity in our testing*
| Quantization | Perplexity |
|:------------:|:----------:|
| **q2.5** | 41.293 |
| **q3.5** | 1.900 |
| **q4.5** | 1.168 |
| **q5.5** | 1.141 |
| **q6.5** | 1.128 |
| **q8.5** | 1.128 |
## Usage Notes
* Runs on a single M3 Ultra 512GB RAM using [Inferencer app](https://inferencer.com)
* Memory usage: ~480 GB
* Expect ~13-19 tokens/s
* Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.26
* For more details see [demonstration video](https://youtu.be/ufXZI6aqOU8) or visit [DeepSeek-V3.1](https://huggingface.co/deepseek-ai/DeepSeek-V3.1). |