Update README.md
Browse files
README.md
CHANGED
|
@@ -7,11 +7,11 @@ tags:
|
|
| 7 |
|
| 8 |
13 TPS
|
| 9 |
|
| 10 |
-
27 TPS with Speculative decoding in LMstudio.
|
| 11 |
|
| 12 |
Draft model: [DeepScaleR-1.5B-Preview-Q8](https://huggingface.co/mlx-community/DeepScaleR-1.5B-Preview-Q8)
|
| 13 |
|
| 14 |
-
Macbook M4 Max: high power (10 TPS on low-power, GPU draws only 5 watts)
|
| 15 |
|
| 16 |
system prompt: "You are Fuse01. You answer very direct brief and concise"
|
| 17 |
|
|
@@ -57,9 +57,9 @@ if tokenizer.chat_template is not None:
|
|
| 57 |
response = generate(model, tokenizer, prompt=prompt, verbose=True)
|
| 58 |
```
|
| 59 |
|
| 60 |
-
Are you still reading down here?
|
| 61 |
|
| 62 |
-
Maybe
|
| 63 |
-
https://huggingface.co/NexaAIDev/DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant
|
| 64 |
|
| 65 |
ore
|
|
|
|
| 7 |
|
| 8 |
13 TPS
|
| 9 |
|
| 10 |
+
27 TPS with Speculative decoding in LMstudio, yeah instant 100% upgrade for math/code stuff.
|
| 11 |
|
| 12 |
Draft model: [DeepScaleR-1.5B-Preview-Q8](https://huggingface.co/mlx-community/DeepScaleR-1.5B-Preview-Q8)
|
| 13 |
|
| 14 |
+
Macbook M4 Max: high power (10 TPS on low-power, GPU draws only 5 watts...less than your brain)
|
| 15 |
|
| 16 |
system prompt: "You are Fuse01. You answer very direct brief and concise"
|
| 17 |
|
|
|
|
| 57 |
response = generate(model, tokenizer, prompt=prompt, verbose=True)
|
| 58 |
```
|
| 59 |
|
| 60 |
+
Are you still reading down here?
|
| 61 |
|
| 62 |
+
Maybe check out this new Q4 lossless quant compression from NexaAI and tell the MLX community how to improve mlx-lm to get 8-bit quality at 4-bit speed!
|
| 63 |
+
[DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant](https://huggingface.co/NexaAIDev/DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant)
|
| 64 |
|
| 65 |
ore
|