mlx-community
/

FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview-Q8

8-bit precision

Model card Files Files and versions

bobig commited on Feb 21

Commit

80aab64

·

verified ·

1 Parent(s): 4250828

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -7,11 +7,11 @@ tags:
 13 TPS
-27 TPS with Speculative decoding in LMstudio.
 Draft model: [DeepScaleR-1.5B-Preview-Q8](https://huggingface.co/mlx-community/DeepScaleR-1.5B-Preview-Q8)
-Macbook M4 Max: high power (10 TPS on low-power, GPU draws only 5 watts)
 system prompt: "You are Fuse01. You answer very direct brief and concise"
@@ -57,9 +57,9 @@ if tokenizer.chat_template is not None:
 response = generate(model, tokenizer, prompt=prompt, verbose=True)
 ```
-Are you still reading down here?  Really?
-Maybe use your OCD super powers to try this new Q4 lossless quant compression and tell us how to improve mlx-lm to get 8-bit quality at 4-bit speed!
-https://huggingface.co/NexaAIDev/DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant
 ore

 13 TPS
+27 TPS with Speculative decoding in LMstudio, yeah instant 100% upgrade for math/code stuff.
 Draft model: [DeepScaleR-1.5B-Preview-Q8](https://huggingface.co/mlx-community/DeepScaleR-1.5B-Preview-Q8)
+Macbook M4 Max: high power (10 TPS on low-power, GPU draws only 5 watts...less than your brain)
 system prompt: "You are Fuse01. You answer very direct brief and concise"
 response = generate(model, tokenizer, prompt=prompt, verbose=True)
 ```
+Are you still reading down here?
+Maybe check out this new Q4 lossless quant compression from NexaAI and tell the MLX community how to improve mlx-lm to get 8-bit quality at 4-bit speed!
+[DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant](https://huggingface.co/NexaAIDev/DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant)
 ore