inferencerlabs
/

deepseek-v3.1-MLX-5.5bit

Text Generation

Model card Files Files and versions

See DeepSeek-V3.1 5.5bit MLX in action - demonstration video

q5.5bit quant typically achieves 1.141 perplexity in our testing

Quantization	Perplexity
q2.5	41.293
q3.5	1.900
q4.5	1.168
q5.5	1.141
q6.5	1.128
q8.5	1.128

Usage Notes

Runs on a single M3 Ultra 512GB RAM using Inferencer app
Memory usage: ~480 GB
Expect ~13-19 tokens/s
Quantized with a modified version of MLX 0.26
For more details see demonstration video or visit DeepSeek-V3.1.

Downloads last month: 677

Safetensors

Model size

671B params

Tensor type

BF16

·

U32

·

F32

·

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for inferencerlabs/deepseek-v3.1-MLX-5.5bit

Base model

deepseek-ai/DeepSeek-V3.1

Quantized

(17)

this model