Update README.md
Browse files
README.md
CHANGED
@@ -19,17 +19,17 @@ base_model:
|
|
19 |
- **KV cache quantization:** OCP FP8
|
20 |
- **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
|
21 |
|
22 |
-
The model is the quantized version of the [
|
23 |
|
24 |
|
25 |
# Model Quantization
|
26 |
|
27 |
-
This model was obtained by quantizing [
|
28 |
|
29 |
**Quantization scripts:**
|
30 |
```
|
31 |
cd Quark/examples/torch/language_modeling/llm_ptq/
|
32 |
-
python3 quantize_quark.py --model_dir "meta-llama/
|
33 |
--model_attn_implementation "sdpa" \
|
34 |
--quant_scheme w_mxfp4_a_mxfp4 \
|
35 |
--kv_cache_dtype fp8 \
|
@@ -56,9 +56,9 @@ Evaluation was conducted using the framework [lm-evaluation-harness](https://git
|
|
56 |
<tr>
|
57 |
<td><strong>Benchmark</strong>
|
58 |
</td>
|
59 |
-
<td><strong>
|
60 |
</td>
|
61 |
-
<td><strong>
|
62 |
</td>
|
63 |
<td><strong>Recovery</strong>
|
64 |
</td>
|
|
|
19 |
- **KV cache quantization:** OCP FP8
|
20 |
- **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
|
21 |
|
22 |
+
The model is the quantized version of the [Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct) model, which is an auto-regressive language model that uses an optimized transformer architecture. For more information, please check [here](https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct). The MXFP4 model is quantized with [AMD-Quark](https://quark.docs.amd.com/latest/index.html).
|
23 |
|
24 |
|
25 |
# Model Quantization
|
26 |
|
27 |
+
This model was obtained by quantizing [Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct)'s weights and activations to MXFP4 and KV caches to FP8, using AutoSmoothQuant algorithm in [AMD-Quark](https://quark.docs.amd.com/latest/index.html).
|
28 |
|
29 |
**Quantization scripts:**
|
30 |
```
|
31 |
cd Quark/examples/torch/language_modeling/llm_ptq/
|
32 |
+
python3 quantize_quark.py --model_dir "meta-llama/Llama-3.1-405B-Instruct" \
|
33 |
--model_attn_implementation "sdpa" \
|
34 |
--quant_scheme w_mxfp4_a_mxfp4 \
|
35 |
--kv_cache_dtype fp8 \
|
|
|
56 |
<tr>
|
57 |
<td><strong>Benchmark</strong>
|
58 |
</td>
|
59 |
+
<td><strong>Llama-3.1-405B-Instruct </strong>
|
60 |
</td>
|
61 |
+
<td><strong>Llama-3.1-405B-Instruct-MXFP4(this model)</strong>
|
62 |
</td>
|
63 |
<td><strong>Recovery</strong>
|
64 |
</td>
|