inferencerlabs
/

Kimi-K2-Instruct-MLX-3.985bit

Text Generation

4-bit precision

Model card Files Files and versions

inferencerlabs commited on 2 days ago

Commit

b500e91

·

verified ·

1 Parent(s): e02d474

Upload complete model

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ tags:
 ---
 **See Kimi-K2 Dynamic MLX in action - [https://youtu.be/-zfUvA2CDqE](https://youtu.be/-zfUvA2CDqE)**
-*q3.95bit dynamic quant achieves 1.243 perplexity in our testing, slotting closer to q4 perplexity (1.168) than q3 perplexity (1.900).*
 | Quantization | Perplexity |
 |:------------:|:----------:|
 | **q2**       | 41.293     |
@@ -21,10 +21,10 @@ tags:
 ## Usage Notes
-* Built with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.26
-* Runs on a single M3 Ultra 512GB RAM
 * Requires expanding VRAM limit to at least ~500000 MB
   * For a larger context window, 507000 is used in VRAM limit command below.
   * `sudo sysctl iogpu.wired_limit_mb=507000`
 * Expect ~20 tokens/s
 * For more details see [demonstration video](https://youtu.be/-zfUvA2CDqE) or visit [Kimi K2](https://moonshotai.github.io/Kimi-K2/).

 ---
 **See Kimi-K2 Dynamic MLX in action - [https://youtu.be/-zfUvA2CDqE](https://youtu.be/-zfUvA2CDqE)**
+*q3.95bit dynamic quant typically achieves 1.243 perplexity in our testing, slotting closer to q4 perplexity (1.168) than q3 perplexity (1.900).*
 | Quantization | Perplexity |
 |:------------:|:----------:|
 | **q2**       | 41.293     |
 ## Usage Notes
+* Runs on a single M3 Ultra 512GB RAM using [Inferencer app](https://inferencer.com)
 * Requires expanding VRAM limit to at least ~500000 MB
   * For a larger context window, 507000 is used in VRAM limit command below.
   * `sudo sysctl iogpu.wired_limit_mb=507000`
 * Expect ~20 tokens/s
+* Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.26
 * For more details see [demonstration video](https://youtu.be/-zfUvA2CDqE) or visit [Kimi K2](https://moonshotai.github.io/Kimi-K2/).