This model is a merge of three differently quantized models from the unsloth/DeepSeek-R1-0528-GGUF repository. Everything except the routed experts comes from Q8_0, while most routed experts come from UD-Q4-XL and 6 more critical block routed experts originate from UD-Q5-XL.

After setting on Mac "sudo sysctl iogpu.wired_limit_mb=516096", my tests show it achieves maximum performance with a 16k context window under this size constraint. A 16k context window is often more than enough. Of course, those with more memory can opt for a larger one. It's clearly much smarter than homogeneous quantized versions of the same size.

Downloads last month
92
GGUF
Model size
671B params
Architecture
deepseek2
Hardware compatibility
Log In to view the estimation

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mmbela/DeepSeek-R1-0528-optimized-for-512Gb-GGUF

Quantized
(36)
this model