|
--- |
|
license: mit |
|
library_name: mlx |
|
base_model: deepseek-ai/DeepSeek-V3.1 |
|
tags: |
|
- mlx |
|
pipeline_tag: text-generation |
|
--- |
|
**See DeepSeek-V3.1 5.5bit MLX in action - [demonstration video](https://youtu.be/ufXZI6aqOU8)** |
|
|
|
*q5.5bit quant typically achieves 1.141 perplexity in our testing* |
|
| Quantization | Perplexity | |
|
|:------------:|:----------:| |
|
| **q2.5** | 41.293 | |
|
| **q3.5** | 1.900 | |
|
| **q4.5** | 1.168 | |
|
| **q5.5** | 1.141 | |
|
| **q6.5** | 1.128 | |
|
| **q8.5** | 1.128 | |
|
|
|
## Usage Notes |
|
|
|
* Runs on a single M3 Ultra 512GB RAM using [Inferencer app](https://inferencer.com) |
|
* Memory usage: ~480 GB |
|
* Expect ~13-19 tokens/s |
|
* Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.26 |
|
* For more details see [demonstration video](https://youtu.be/ufXZI6aqOU8) or visit [DeepSeek-V3.1](https://huggingface.co/deepseek-ai/DeepSeek-V3.1). |