Upload complete model
Browse files
README.md
CHANGED
@@ -9,7 +9,7 @@ tags:
|
|
9 |
---
|
10 |
**See Kimi-K2 Dynamic MLX in action - [https://youtu.be/-zfUvA2CDqE](https://youtu.be/-zfUvA2CDqE)**
|
11 |
|
12 |
-
*q3.95bit dynamic quant achieves 1.243 perplexity in our testing, slotting closer to q4 perplexity (1.168) than q3 perplexity (1.900).*
|
13 |
| Quantization | Perplexity |
|
14 |
|:------------:|:----------:|
|
15 |
| **q2** | 41.293 |
|
@@ -21,10 +21,10 @@ tags:
|
|
21 |
|
22 |
## Usage Notes
|
23 |
|
24 |
-
*
|
25 |
-
* Runs on a single M3 Ultra 512GB RAM
|
26 |
* Requires expanding VRAM limit to at least ~500000 MB
|
27 |
* For a larger context window, 507000 is used in VRAM limit command below.
|
28 |
* `sudo sysctl iogpu.wired_limit_mb=507000`
|
29 |
* Expect ~20 tokens/s
|
|
|
30 |
* For more details see [demonstration video](https://youtu.be/-zfUvA2CDqE) or visit [Kimi K2](https://moonshotai.github.io/Kimi-K2/).
|
|
|
9 |
---
|
10 |
**See Kimi-K2 Dynamic MLX in action - [https://youtu.be/-zfUvA2CDqE](https://youtu.be/-zfUvA2CDqE)**
|
11 |
|
12 |
+
*q3.95bit dynamic quant typically achieves 1.243 perplexity in our testing, slotting closer to q4 perplexity (1.168) than q3 perplexity (1.900).*
|
13 |
| Quantization | Perplexity |
|
14 |
|:------------:|:----------:|
|
15 |
| **q2** | 41.293 |
|
|
|
21 |
|
22 |
## Usage Notes
|
23 |
|
24 |
+
* Runs on a single M3 Ultra 512GB RAM using [Inferencer app](https://inferencer.com)
|
|
|
25 |
* Requires expanding VRAM limit to at least ~500000 MB
|
26 |
* For a larger context window, 507000 is used in VRAM limit command below.
|
27 |
* `sudo sysctl iogpu.wired_limit_mb=507000`
|
28 |
* Expect ~20 tokens/s
|
29 |
+
* Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.26
|
30 |
* For more details see [demonstration video](https://youtu.be/-zfUvA2CDqE) or visit [Kimi K2](https://moonshotai.github.io/Kimi-K2/).
|