Experimental "iQ4_W" Quantization of Cydonia-24B-4.1
Original model: TheDrummer/Cydonia-24B-v4.1
iQ4_W Quantization Scheme
"iQ4_W" is an unofficial llama.cpp quantization scheme inspired by Q4_K_X.
The GGUF model available in this repo is quantized as followed:
Tensor name | Q4_K_X | iQ4_W |
---|---|---|
token_embd |
Q4_K | Q5_K |
ffn_gate |
Q4_K | IQ4_XS |
ffn_up |
Q4_K | IQ4_XS |
ffn_down |
Q5_K | Q5_K |
attn_q |
Q4_K | Q5_K |
attn_k |
Q8_0 | Q8_0 |
attn_v |
Q8_0 | Q8_0 |
attn_output |
Q5_K | Q5_K |
output |
Q8_0 | Q6_K |
Layers 0,1,2,38,39 are Wider:
Tensor name | Q4_K_X | iQ4_W |
---|---|---|
(0/1/2/38/39).ffn_gate |
Q4_K | Q5_K |
(0/1/2/38/39).ffn_up |
Q4_K | Q5_K |
(0/1/2/38/39).ffn_down |
Q5_K | Q6_K |
(0/1/2/38/39).attn_q |
Q4_K | Q6_K |
(0/1/2/38/39).attn_output |
Q5_K | Q6_K |
KL-divergence from Q8_0
Quant | BPW | Mean KLD | 99.9% KLD | 99.0% KLD | Median KLD |
---|---|---|---|---|---|
Q5_K_S | 5.53 | 0.010427 | 0.507101 | 0.145578 | 0.004048 |
iQ4_W | 5.01 | 0.015892 | 0.987876 | 0.255739 | 0.004836 |
Q4_K_X | 5.01 | 0.015803 | 1.001576 | 0.250834 | 0.004847 |
Q4_K_M | 4.86 | 0.025398 | 1.533436 | 0.367073 | 0.007728 |
Usage
Mistral v7 Tekken
Recommended Settings
Sampler | Range |
---|---|
temperature |
0.6-1 |
top_nsigma |
1.2-1.34 |
smoothing_factor |
0,2 |
smoothing_curve |
1 |
dry_multiplier |
0.2-0.3 |
dry_base |
1.25-1.5 |
༼ つ ◕_◕ ༽つ
Please Test
- Downloads last month
- 19
Hardware compatibility
Log In
to view the estimation
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for MachineGaslighter/Cydonia-24B-v4.1-Q4_W-GGUF
Base model
TheDrummer/Cydonia-24B-v4.1