inferencerlabs
/

deepseek-v3.1-MLX-5.5bit

Text Generation

Model card Files Files and versions

deepseek-v3.1-MLX-5.5bit / README.md

inferencerlabs's picture

Upload complete model

8b23818 verified 3 days ago

|

history blame contribute delete

910 Bytes

	---
	license: mit
	library_name: mlx
	base_model: deepseek-ai/DeepSeek-V3.1
	tags:
	- mlx
	pipeline_tag: text-generation
	---
	See DeepSeek-V3.1 5.5bit MLX in action - [demonstration video](https://youtu.be/ufXZI6aqOU8)

	q5.5bit quant typically achieves 1.141 perplexity in our testing
	\| Quantization \| Perplexity \|
	\|:------------:\|:----------:\|
	\| q2.5 \| 41.293 \|
	\| q3.5 \| 1.900 \|
	\| q4.5 \| 1.168 \|
	\| q5.5 \| 1.141 \|
	\| q6.5 \| 1.128 \|
	\| q8.5 \| 1.128 \|

	## Usage Notes

	* Runs on a single M3 Ultra 512GB RAM using [Inferencer app](https://inferencer.com)
	* Memory usage: ~480 GB
	* Expect ~13-19 tokens/s
	* Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.26
	* For more details see [demonstration video](https://youtu.be/ufXZI6aqOU8) or visit [DeepSeek-V3.1](https://huggingface.co/deepseek-ai/DeepSeek-V3.1).