inferencerlabs
/

deepseek-v3.1-MLX-5.5bit

Text Generation

Model card Files Files and versions

inferencerlabs commited on Aug 21

Commit

f2ee58a

·

verified ·

1 Parent(s): cd17387

Upload complete model

Files changed (1) hide show

README.md +26 -0

README.md ADDED Viewed

	@@ -0,0 +1,26 @@

+---
+license: mit
+library_name: mlx
+base_model: deepseek-ai/DeepSeek-V3.1
+tags:
+- mlx
+pipeline_tag: text-generation
+---
+**See DeepSeek-V3.1 5.5bit MLX in action - [demonstration video](https://youtube.com/xcreate)**
+*q5.5bit quant typically achieves 1.141 perplexity in our testing
+| Quantization | Perplexity |
+|:------------:|:----------:|
+| **q2.5**     | 41.293     |
+| **q3.5**     | 1.900      |
+| **q4.5**     | 1.168      |
+| **q5.5**     | 1.141      |
+| **q6.5**     | 1.128      |
+| **q8.5**     | 1.128      |
+## Usage Notes
+* Built with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.26
+* Memory usage: ~480 GB
+* Expect ~15 tokens/s
+* For more details see [demonstration video](https://youtube.com/xcreate) or visit [DeepSeek-V3.1](https://huggingface.co/deepseek-ai/DeepSeek-V3.1).