inferencerlabs's picture
Upload complete model
ec2d085 verified
|
raw
history blame
970 Bytes
metadata
license: mit
library_name: mlx
base_model: deepseek-ai/DeepSeek-V3.1
tags:
  - mlx
pipeline_tag: text-generation

----CURRENTLY UPLOADING FILES-----

This notice will be removed once all files have been uploaded.

Notes

See DeepSeek-V3.1 5.5bit MLX in action - demonstration video

q5.5bit quant typically achieves 1.141 perplexity in our testing

Quantization Perplexity
q2.5 41.293
q3.5 1.900
q4.5 1.168
q5.5 1.141
q6.5 1.128
q8.5 1.128

Usage Notes

  • Runs on a single M3 Ultra 512GB RAM
  • Memory usage: ~480 GB
  • Expect ~13-19 tokens/s
  • Built with a modified version of MLX 0.26
  • For more details see demonstration video or visit DeepSeek-V3.1.