Update README.md
Browse files
README.md
CHANGED
@@ -7,11 +7,11 @@ tags:
|
|
7 |
|
8 |
13 TPS
|
9 |
|
10 |
-
27 TPS with Speculative decoding in LMstudio.
|
11 |
|
12 |
Draft model: [DeepScaleR-1.5B-Preview-Q8](https://huggingface.co/mlx-community/DeepScaleR-1.5B-Preview-Q8)
|
13 |
|
14 |
-
Macbook M4 Max: high power (10 TPS on low-power, GPU draws only 5 watts)
|
15 |
|
16 |
system prompt: "You are Fuse01. You answer very direct brief and concise"
|
17 |
|
@@ -57,9 +57,9 @@ if tokenizer.chat_template is not None:
|
|
57 |
response = generate(model, tokenizer, prompt=prompt, verbose=True)
|
58 |
```
|
59 |
|
60 |
-
Are you still reading down here?
|
61 |
|
62 |
-
Maybe
|
63 |
-
https://huggingface.co/NexaAIDev/DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant
|
64 |
|
65 |
ore
|
|
|
7 |
|
8 |
13 TPS
|
9 |
|
10 |
+
27 TPS with Speculative decoding in LMstudio, yeah instant 100% upgrade for math/code stuff.
|
11 |
|
12 |
Draft model: [DeepScaleR-1.5B-Preview-Q8](https://huggingface.co/mlx-community/DeepScaleR-1.5B-Preview-Q8)
|
13 |
|
14 |
+
Macbook M4 Max: high power (10 TPS on low-power, GPU draws only 5 watts...less than your brain)
|
15 |
|
16 |
system prompt: "You are Fuse01. You answer very direct brief and concise"
|
17 |
|
|
|
57 |
response = generate(model, tokenizer, prompt=prompt, verbose=True)
|
58 |
```
|
59 |
|
60 |
+
Are you still reading down here?
|
61 |
|
62 |
+
Maybe check out this new Q4 lossless quant compression from NexaAI and tell the MLX community how to improve mlx-lm to get 8-bit quality at 4-bit speed!
|
63 |
+
[DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant](https://huggingface.co/NexaAIDev/DeepSeek-R1-Distill-Qwen-1.5B-NexaQuant)
|
64 |
|
65 |
ore
|