gghfez
/

DeepSeek-V3.1-IQ2_KS

Text Generation

Model card Files Files and versions

gghfez commited on Aug 22

Commit

1f22824

·

verified ·

1 Parent(s): 4d4be44

Update README.md

Files changed (1) hide show

README.md +83 -3

README.md CHANGED Viewed

@@ -1,3 +1,83 @@
----
-license: apache-2.0
----

+---
+quantized_by: gghfez
+pipeline_tag: text-generation
+base_model: deepseek-ai/DeepSeek-V3.1
+license: mit
+base_model_relation: quantized
+tags:
+- mla
+- imatrix
+- deepseek_v3.1
+- conversational
+- ik_llama.cpp
+---
+## `ik_llama.cpp` imatrix Quantizations of deepseek-ai/DeepSeek-V3.1
+This quant **REQUIRES** [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp/) fork to support the ik's latest SOTA quants and optimizations! Do **not** download these big files and expect them to run on mainline vanilla llama.cpp, ollama, LM Studio, KoboldCpp, etc!
+*NOTE* `ik_llama.cpp` can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.
+I made this for myself and my RAM+VRAM setup. For more ik_llama quants of this model, discussions, perplexity measurements, see @ubergarm's [DeepSeek-V3.1 Collection](https://huggingface.co/ubergarm/DeepSeek-V3.1-GGUF)
+<details>
+<summary>👈 Quant details</summary>
+```bash
+#!/usr/bin/env bash
+custom="
+# First 3 dense layers (0-3) (GPU)
+# Using q8_0 for attn_k_b since imatrix might not have these tensors
+blk\.[0-2]\.attn_k_b.*=q8_0
+blk\.[0-2]\.attn_.*=iq5_ks
+blk\.[0-2]\.ffn_down.*=iq5_ks
+blk\.[0-2]\.ffn_(gate|up).*=iq4_ks
+blk\.[0-2]\..*=iq5_ks
+# All attention, norm weights, and bias tensors for MoE layers (3-60) (GPU)
+# Using q8_0 for attn_k_b since imatrix might not have these tensors
+blk\.[3-9]\.attn_k_b.*=q8_0
+blk\.[1-5][0-9]\.attn_k_b.*=q8_0
+blk\.60\.attn_k_b.*=q8_0
+blk\.[3-9]\.attn_.*=iq5_ks
+blk\.[1-5][0-9]\.attn_.*=iq5_ks
+blk\.60\.attn_.*=iq5_ks
+# Shared Expert (3-60) (GPU)
+blk\.[3-9]\.ffn_down_shexp\.weight=iq5_ks
+blk\.[1-5][0-9]\.ffn_down_shexp\.weight=iq5_ks
+blk\.60\.ffn_down_shexp\.weight=iq5_ks
+blk\.[3-9]\.ffn_(gate|up)_shexp\.weight=iq4_ks
+blk\.[1-5][0-9]\.ffn_(gate|up)_shexp\.weight=iq4_ks
+blk\.60\.ffn_(gate|up)_shexp\.weight=iq4_ks
+# Routed Experts (3-60) (CPU)
+blk\.[3-9]\.ffn_down_exps\.weight=iq3_ks
+blk\.[1-5][0-9]\.ffn_down_exps\.weight=iq3_ks
+blk\.60\.ffn_down_exps\.weight=iq3_ks
+blk\.[3-9]\.ffn_(gate|up)_exps\.weight=iq2_ks
+blk\.[1-5][0-9]\.ffn_(gate|up)_exps\.weight=iq2_ks
+blk\.60\.ffn_(gate|up)_exps\.weight=iq2_ks
+# Token embedding and output tensors (GPU)
+token_embd\.weight=iq5_k
+output\.weight=q8_0  # Changed to q8_0
+"
+custom=$(
+  echo "$custom" | grep -v '^#' | \
+  sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
+)
+./build/bin/llama-quantize \
+    --custom-q "$custom" \
+    --imatrix /fast/DeepSeek-V3.1.imatrix \
+    /fast/bf16/DeepSeek-V3-00001-of-00030.gguf
+    /fast2/quants/DeepSeek-V3.1-IQ2_KS.gguf \
+    IQ2_KS \
+```
+</details>