mlx-community
/

FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview-Q8

8-bit precision

Model card Files Files and versions

bobig commited on Feb 21

Commit

a6b9807

·

verified ·

1 Parent(s): 9bebb31

Update README.md

Files changed (1) hide show

README.md +7 -6

README.md CHANGED Viewed

@@ -7,9 +7,11 @@ tags:
 13 TPS
-27 TPS with Speculative decoding in LMstudio, yeah instant 100% upgrade for math/code stuff.
-Draft model: [DeepScaleR-1.5B-Preview-Q8](https://huggingface.co/mlx-community/DeepScaleR-1.5B-Preview-Q8)
 Macbook M4 Max: high power (10 TPS on low-power, GPU draws only 5 watts...less than your brain)
@@ -23,11 +25,10 @@ Context: 131072, Temp: 0
 Try this model in Visual Studio Code with the Roo Code extension. Starting in Architect Mode and letting it auto switch to Code Mode.... it actually spits decent code for small projects with multiple files.
 Getting close to last year's Claude Sonnet for small projects.  It actually stays reasonably stable even with Roo Code's huge 10k system prompt.  The model still shits the bed for big projects but better after adding roo-code-memory-bank.
 So far (Feb 20, 2025) this is the only model & quant that runs fast on Mac, spits decent code on projects AND works with Speculative Decoding.
-Huge thanks to all who helped Macs get this far!
 # bobig/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview-Q8
@@ -57,9 +58,9 @@ if tokenizer.chat_template is not None:
 response = generate(model, tokenizer, prompt=prompt, verbose=True)
 ```
-Are you still reading down here?
-Maybe check out this new Q4 lossless quant compression from NexaAI and tell the MLX community how to improve mlx-lm to get 8-bit quality at 4-bit speed!
 [DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant](https://huggingface.co/NexaAIDev/DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant)

 13 TPS
+27 TPS with Draft model: [DeepScaleR-1.5B-Preview-Q8](https://huggingface.co/mlx-community/DeepScaleR-1.5B-Preview-Q8)
+oh yeah, 100% faster for math/code stuff.
 Macbook M4 Max: high power (10 TPS on low-power, GPU draws only 5 watts...less than your brain)
 Try this model in Visual Studio Code with the Roo Code extension. Starting in Architect Mode and letting it auto switch to Code Mode.... it actually spits decent code for small projects with multiple files.
 Getting close to last year's Claude Sonnet for small projects.  It actually stays reasonably stable even with Roo Code's huge 10k system prompt.  The model still shits the bed for big projects but better after adding roo-code-memory-bank.
 So far (Feb 20, 2025) this is the only model & quant that runs fast on Mac, spits decent code on projects AND works with Speculative Decoding.
+Huge thanks to all who helped Macs get this far!
 # bobig/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview-Q8
 response = generate(model, tokenizer, prompt=prompt, verbose=True)
 ```
+Are you still reading down here?
+Maybe check out this new Q4 lossless from NexaAI and tell the MLX community how to improve mlx-lm to get 8-bit quality at 4-bit speed!
 [DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant](https://huggingface.co/NexaAIDev/DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant)