bobig commited on
Commit
80aab64
·
verified ·
1 Parent(s): 4250828

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -7,11 +7,11 @@ tags:
7
 
8
  13 TPS
9
 
10
- 27 TPS with Speculative decoding in LMstudio.
11
 
12
  Draft model: [DeepScaleR-1.5B-Preview-Q8](https://huggingface.co/mlx-community/DeepScaleR-1.5B-Preview-Q8)
13
 
14
- Macbook M4 Max: high power (10 TPS on low-power, GPU draws only 5 watts)
15
 
16
  system prompt: "You are Fuse01. You answer very direct brief and concise"
17
 
@@ -57,9 +57,9 @@ if tokenizer.chat_template is not None:
57
  response = generate(model, tokenizer, prompt=prompt, verbose=True)
58
  ```
59
 
60
- Are you still reading down here? Really?
61
 
62
- Maybe use your OCD super powers to try this new Q4 lossless quant compression and tell us how to improve mlx-lm to get 8-bit quality at 4-bit speed!
63
- https://huggingface.co/NexaAIDev/DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant
64
 
65
  ore
 
7
 
8
  13 TPS
9
 
10
+ 27 TPS with Speculative decoding in LMstudio, yeah instant 100% upgrade for math/code stuff.
11
 
12
  Draft model: [DeepScaleR-1.5B-Preview-Q8](https://huggingface.co/mlx-community/DeepScaleR-1.5B-Preview-Q8)
13
 
14
+ Macbook M4 Max: high power (10 TPS on low-power, GPU draws only 5 watts...less than your brain)
15
 
16
  system prompt: "You are Fuse01. You answer very direct brief and concise"
17
 
 
57
  response = generate(model, tokenizer, prompt=prompt, verbose=True)
58
  ```
59
 
60
+ Are you still reading down here?
61
 
62
+ Maybe check out this new Q4 lossless quant compression from NexaAI and tell the MLX community how to improve mlx-lm to get 8-bit quality at 4-bit speed!
63
+ [DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant](https://huggingface.co/NexaAIDev/DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant)
64
 
65
  ore