Experimental "iQ4_W" Quantization of Cydonia-24B-4.1

Original model: TheDrummer/Cydonia-24B-v4.1

iQ4_W Quantization Scheme

"iQ4_W" is an unofficial llama.cpp quantization scheme inspired by Q4_K_X.

The GGUF model available in this repo is quantized as followed:

Tensor name Q4_K_X iQ4_W
token_embd Q4_K Q5_K
ffn_gate Q4_K IQ4_XS
ffn_up Q4_K IQ4_XS
ffn_down Q5_K Q5_K
attn_q Q4_K Q5_K
attn_k Q8_0 Q8_0
attn_v Q8_0 Q8_0
attn_output Q5_K Q5_K
output Q8_0 Q6_K

Layers 0,1,2,38,39 are Wider:

Tensor name Q4_K_X iQ4_W
(0/1/2/38/39).ffn_gate Q4_K Q5_K
(0/1/2/38/39).ffn_up Q4_K Q5_K
(0/1/2/38/39).ffn_down Q5_K Q6_K
(0/1/2/38/39).attn_q Q4_K Q6_K
(0/1/2/38/39).attn_output Q5_K Q6_K

KL-divergence from Q8_0

Quant BPW Mean KLD 99.9% KLD 99.0% KLD Median KLD
Q5_K_S 5.53 0.010427 0.507101 0.145578 0.004048
iQ4_W 5.01 0.015892 0.987876 0.255739 0.004836
Q4_K_X 5.01 0.015803 1.001576 0.250834 0.004847
Q4_K_M 4.86 0.025398 1.533436 0.367073 0.007728

Usage

Mistral v7 Tekken

Recommended Settings

Sampler Range
temperature 0.6-1
top_nsigma 1.2-1.34
smoothing_factor 0,2
smoothing_curve 1
dry_multiplier 0.2-0.3
dry_base 1.25-1.5

༼ つ ◕_◕ ༽つ

Please Test

Downloads last month
19
GGUF
Model size
23.6B params
Architecture
llama
Hardware compatibility
Log In to view the estimation
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MachineGaslighter/Cydonia-24B-v4.1-Q4_W-GGUF

Quantized
(10)
this model