--- license: mit library_name: mlx base_model: deepseek-ai/DeepSeek-V3.1 tags: - mlx pipeline_tag: text-generation --- **See DeepSeek-V3.1 5.5bit MLX in action - [demonstration video](https://youtu.be/ufXZI6aqOU8)** *q5.5bit quant typically achieves 1.141 perplexity in our testing* | Quantization | Perplexity | |:------------:|:----------:| | **q2.5** | 41.293 | | **q3.5** | 1.900 | | **q4.5** | 1.168 | | **q5.5** | 1.141 | | **q6.5** | 1.128 | | **q8.5** | 1.128 | ## Usage Notes * Runs on a single M3 Ultra 512GB RAM using [Inferencer app](https://inferencer.com) * Memory usage: ~480 GB * Expect ~13-19 tokens/s * Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.26 * For more details see [demonstration video](https://youtu.be/ufXZI6aqOU8) or visit [DeepSeek-V3.1](https://huggingface.co/deepseek-ai/DeepSeek-V3.1).