Update README.md
Browse files
README.md
CHANGED
@@ -20,8 +20,17 @@ This is a 4.125 EXL2 quant of [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingf
|
|
20 |
|
21 |
This quant was made using a [customized version](https://github.com/dinerburger/exllamav2/tree/max-quant-first-last) of exllamav2-0.2.7 (patch graciously provided by [DeusImperator](https://huggingface.co/DeusImperator)) with default dataset and extended quantization sample length (8k instead of default 2k). It also uses -head_bits=8 and max accuracy quant for first and last layer (8bpw), all other layers of the model use normally chosen methods (method and name (4.125bpw_L) inspired by GGUF naming scheme).
|
22 |
|
23 |
-
|
24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
# Qwen2.5-Coder-32B-Instruct Original Card
|
27 |
<a href="https://chat.qwenlm.ai/" target="_blank" style="margin: 2px;">
|
|
|
20 |
|
21 |
This quant was made using a [customized version](https://github.com/dinerburger/exllamav2/tree/max-quant-first-last) of exllamav2-0.2.7 (patch graciously provided by [DeusImperator](https://huggingface.co/DeusImperator)) with default dataset and extended quantization sample length (8k instead of default 2k). It also uses -head_bits=8 and max accuracy quant for first and last layer (8bpw), all other layers of the model use normally chosen methods (method and name (4.125bpw_L) inspired by GGUF naming scheme).
|
22 |
|
23 |
+
## A note about context length
|
24 |
+
By default, this model caps out at 32K context. Additional configuration is required to unlock full 128K context. Namely, this code block must be added to config.json:
|
25 |
+
|
26 |
+
```
|
27 |
+
"rope_scaling": {
|
28 |
+
"factor": 4.0,
|
29 |
+
"original_max_position_embeddings": 32768,
|
30 |
+
"type": "yarn"
|
31 |
+
}```
|
32 |
+
|
33 |
+
Once this is done, you can push the model to 64K context at Q4 KV cache quantization on a single 24GB VRAM card with minimal loss of accuracy.
|
34 |
|
35 |
# Qwen2.5-Coder-32B-Instruct Original Card
|
36 |
<a href="https://chat.qwenlm.ai/" target="_blank" style="margin: 2px;">
|