SOBAMD commited on
Commit
041f18a
·
verified ·
1 Parent(s): e39c8bb

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -0
README.md ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ pipeline_tag: text-generation
5
+ base_model:
6
+ - deepseek-ai/DeepSeek-R1-Distill-Llama-8B
7
+ ---
8
+
9
+ # DeepSeek-R1-Distill-Llama-8B-awq-asym-uint4-g128-lmhead-onnx-cpu
10
+ - ## Introduction
11
+ This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset.
12
+ - ## Quantization Strategy
13
+ - ***Quantized Layers***: All linear layers
14
+ - ***Weight***: uint4 asymmetric per-group, group_size=128
15
+ - ## Quick Start
16
+ 1. [Download and install Quark](https://quark.docs.amd.com/latest/install.html)
17
+ 2. Run the quantization script in the example folder using the following command line:
18
+ ```sh
19
+ export MODEL_DIR = [local model checkpoint folder] or DeepSeek-R1-Distill-Llama-8B
20
+ # single GPU
21
+ python quantize_quark.py --model_dir $MODEL_DIR \
22
+ --output_dir output_dir $MODEL_NAME-awq-asym-uint4-g128-lmhead \
23
+ --quant_scheme w_uint4_per_group_asym \
24
+ --num_calib_data 128 \
25
+ --quant_algo awq \
26
+ --dataset pileval_for_awq_benchmark \
27
+ --seq_len 512 \
28
+ --model_export hf_format \
29
+ --data_type bfloat16 \
30
+ --exclude_layers
31
+ # cpu
32
+ python quantize_quark.py --model_dir $MODEL_DIR \
33
+ --output_dir output_dir $MODEL_NAME-awq-asym-uint4-g128-lmhead \
34
+ --quant_scheme w_uint4_per_group_asym \
35
+ --num_calib_data 128 \
36
+ --quant_algo awq \
37
+ --dataset pileval_for_awq_benchmark \
38
+ --seq_len 512 \
39
+ --model_export hf_format \
40
+ --data_type bfloat16 \
41
+ --exclude_layers \
42
+ --device cpu
43
+ ```
44
+ ## Deployment
45
+ Quark has its own export format, quark_safetensors, which is compatible with autoAWQ exports.
46
+ ## Evaluation
47
+ Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py.
48
+ The quantization evaluation results are conducted in pseudo-quantization mode, which may slightly differ from the actual quantized inference accuracy. These results are provided for reference only.
49
+ #### Evaluation scores
50
+ <table>
51
+ <tr>
52
+ <td><strong>Benchmark</strong>
53
+ </td>
54
+ <td><strong>deepseek-ai/DeepSeek-R1-Distill-Llama-8B</strong>
55
+ </td>
56
+ <td><strong>amd/DeepSeek-R1-Distill-Llama-8B-awq-asym-uint4-g128-lmhead-onnx-cpu (this model)</strong>
57
+ </td>
58
+ </tr>
59
+ <tr>
60
+ <td>Perplexity-wikitext2
61
+ </td>
62
+ <td>13.1432
63
+ </td>
64
+ <td>
65
+ </td>
66
+ </tr>
67
+
68
+ </table>
69
+
70
+ #### License
71
+ Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.
72
+
73
+ Licensed under the Apache License, Version 2.0 (the "License");
74
+ you may not use this file except in compliance with the License.
75
+ You may obtain a copy of the License at
76
+
77
+ http://www.apache.org/licenses/LICENSE-2.0
78
+ Unless required by applicable law or agreed to in writing, software
79
+ distributed under the License is distributed on an "AS IS" BASIS,
80
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
81
+ See the License for the specific language governing permissions and
82
+ limitations under the License.