Post
78
Experimental global target bits‑per‑weight quantization of ServiceNow-AI/Apriel-1.6-15b-Thinker and zai-org/GLM-4.6V-Flash
The method to produce these experimental versions involves using a custom version of llama-imatrix to generate an imatrix including the mean activations, and a custom version of llama-quantize, which computes a per-tensor weighted mean squared quantization error and a bias/projection term (if the imatrix includes activations), to automatically select the lowest error quantization recipe that achieves a global target bits‑per‑weight (bpw).
More information in the models' cards
eaddario/Apriel-1.6-15b-Thinker-GGUF
eaddario/GLM-4.6V-Flash-GGUF
The method to produce these experimental versions involves using a custom version of llama-imatrix to generate an imatrix including the mean activations, and a custom version of llama-quantize, which computes a per-tensor weighted mean squared quantization error and a bias/projection term (if the imatrix includes activations), to automatically select the lowest error quantization recipe that achieves a global target bits‑per‑weight (bpw).
More information in the models' cards
eaddario/Apriel-1.6-15b-Thinker-GGUF
eaddario/GLM-4.6V-Flash-GGUF