Update README.md
Browse files
README.md
CHANGED
@@ -33,7 +33,7 @@ inference:
|
|
33 |
- **Model Type:** GGUF quantized (q4_k_m and q8_0)
|
34 |
- **Base Model:** unsloth/llama-3-8b-bnb-4bit
|
35 |
- **Quantization Details:**
|
36 |
-
- Methods: q4_k_m
|
37 |
- q4_k_m uses Q6_K for half of attention.wv and feed_forward.w2 tensors
|
38 |
- Optimized for both speed (q8_0) and quality (q4_k_m)
|
39 |
|
|
|
33 |
- **Model Type:** GGUF quantized (q4_k_m and q8_0)
|
34 |
- **Base Model:** unsloth/llama-3-8b-bnb-4bit
|
35 |
- **Quantization Details:**
|
36 |
+
- Methods: q4_k_m, q8_0, BF16
|
37 |
- q4_k_m uses Q6_K for half of attention.wv and feed_forward.w2 tensors
|
38 |
- Optimized for both speed (q8_0) and quality (q4_k_m)
|
39 |
|