See DeepSeek-V3.1 5.5bit MLX in action - demonstration video
q5.5bit quant typically achieves 1.141 perplexity in our testing
Quantization | Perplexity |
---|---|
q2.5 | 41.293 |
q3.5 | 1.900 |
q4.5 | 1.168 |
q5.5 | 1.141 |
q6.5 | 1.128 |
q8.5 | 1.128 |
Usage Notes
- Runs on a single M3 Ultra 512GB RAM using Inferencer app
- Memory usage: ~480 GB
- Expect ~13-19 tokens/s
- Quantized with a modified version of MLX 0.26
- For more details see demonstration video or visit DeepSeek-V3.1.
- Downloads last month
- 677
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for inferencerlabs/deepseek-v3.1-MLX-5.5bit
Base model
deepseek-ai/DeepSeek-V3.1