ik_llama.cpp
imatrix Quantizations of deepseek-ai/DeepSeek-V3.1
This quant REQUIRES ik_llama.cpp fork to support the ik's latest SOTA quants and optimizations! Do not download these big files and expect them to run on mainline vanilla llama.cpp, ollama, LM Studio, KoboldCpp, etc!
NOTE ik_llama.cpp
can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.
I made this for myself and my RAM+VRAM setup. For more ik_llama quants of this model, discussions, perplexity measurements, see @ubergarm's DeepSeek-V3.1 Collection
👈 Quant details
#!/usr/bin/env bash
custom="
# First 3 dense layers (0-3) (GPU)
# Using q8_0 for attn_k_b since imatrix might not have these tensors
blk\.[0-2]\.attn_k_b.*=q8_0
blk\.[0-2]\.attn_.*=iq5_ks
blk\.[0-2]\.ffn_down.*=iq5_ks
blk\.[0-2]\.ffn_(gate|up).*=iq4_ks
blk\.[0-2]\..*=iq5_ks
# All attention, norm weights, and bias tensors for MoE layers (3-60) (GPU)
# Using q8_0 for attn_k_b since imatrix might not have these tensors
blk\.[3-9]\.attn_k_b.*=q8_0
blk\.[1-5][0-9]\.attn_k_b.*=q8_0
blk\.60\.attn_k_b.*=q8_0
blk\.[3-9]\.attn_.*=iq5_ks
blk\.[1-5][0-9]\.attn_.*=iq5_ks
blk\.60\.attn_.*=iq5_ks
# Shared Expert (3-60) (GPU)
blk\.[3-9]\.ffn_down_shexp\.weight=iq5_ks
blk\.[1-5][0-9]\.ffn_down_shexp\.weight=iq5_ks
blk\.60\.ffn_down_shexp\.weight=iq5_ks
blk\.[3-9]\.ffn_(gate|up)_shexp\.weight=iq4_ks
blk\.[1-5][0-9]\.ffn_(gate|up)_shexp\.weight=iq4_ks
blk\.60\.ffn_(gate|up)_shexp\.weight=iq4_ks
# Routed Experts (3-60) (CPU)
blk\.[3-9]\.ffn_down_exps\.weight=iq3_ks
blk\.[1-5][0-9]\.ffn_down_exps\.weight=iq3_ks
blk\.60\.ffn_down_exps\.weight=iq3_ks
blk\.[3-9]\.ffn_(gate|up)_exps\.weight=iq2_ks
blk\.[1-5][0-9]\.ffn_(gate|up)_exps\.weight=iq2_ks
blk\.60\.ffn_(gate|up)_exps\.weight=iq2_ks
# Token embedding and output tensors (GPU)
token_embd\.weight=iq5_k
output\.weight=q8_0 # Changed to q8_0
"
custom=$(
echo "$custom" | grep -v '^#' | \
sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
)
./build/bin/llama-quantize \
--custom-q "$custom" \
--imatrix /fast/DeepSeek-V3.1.imatrix \
/fast/bf16/DeepSeek-V3-00001-of-00030.gguf
/fast2/quants/DeepSeek-V3.1-IQ2_KS.gguf \
IQ2_KS \
- Downloads last month
- 70
Hardware compatibility
Log In
to view the estimation
2-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for gghfez/DeepSeek-V3.1-IQ2_KS
Base model
deepseek-ai/DeepSeek-V3.1