inferencerlabs
/

deepseek-v3.1-MLX-5.5bit

Text Generation

Model card Files Files and versions

deepseek-v3.1-MLX-5.5bit / README.md

inferencerlabs's picture

Upload complete model

8b23818 verified 2 days ago

|

history blame contribute delete

910 Bytes

metadata

license: mit
library_name: mlx
base_model: deepseek-ai/DeepSeek-V3.1
tags:
  - mlx
pipeline_tag: text-generation

See DeepSeek-V3.1 5.5bit MLX in action - demonstration video

q5.5bit quant typically achieves 1.141 perplexity in our testing

Quantization	Perplexity
q2.5	41.293
q3.5	1.900
q4.5	1.168
q5.5	1.141
q6.5	1.128
q8.5	1.128

Usage Notes

Runs on a single M3 Ultra 512GB RAM using Inferencer app
Memory usage: ~480 GB
Expect ~13-19 tokens/s
Quantized with a modified version of MLX 0.26
For more details see demonstration video or visit DeepSeek-V3.1.