satreysa commited on
Commit
a1844d2
·
verified ·
1 Parent(s): 5374ada

Readme update

Browse files
Files changed (1) hide show
  1. README.md +36 -52
README.md CHANGED
@@ -1,52 +1,36 @@
1
- ---
2
- license: mit
3
- language:
4
- - en
5
- base_model:
6
- - meta-llama/CodeLlama-7b-Instruct-hf
7
- pipeline_tag: text-generation
8
- ---
9
-
10
- Quark Version used - https://gitenterprise.xilinx.com/AMDNeuralOpt/Quark/tree/v0.8rc3
11
-
12
- Quark Command:
13
-
14
- ```
15
-
16
- python3 quantize_quark.py \
17
- --model_dir models/CodeLlama-7b-Instruct-hf \
18
- --output_dir quantized_safetensor/CodeLlama-7b-Instruct-gs128-AWQ-Quant \
19
- --quant_scheme w_uint4_per_group_asym \
20
- --num_calib_data 128 \
21
- --quant_algo awq \
22
- --dataset pileval_for_awq_benchmark \
23
- --model_export hf_format \
24
- --group_size 128 \
25
- --data_type float32 \
26
- --exclude_layers
27
-
28
- ```
29
- Perplexity after quantization: 7.181363105773926
30
-
31
- OGA Model builder: https://gitenterprise.xilinx.com/AMDNeuralOpt/onnxruntime-genai/tree/model_builder_rai_1.4
32
-
33
- OGA Model builder command:
34
-
35
- ```
36
- python builder.py \
37
- -i <quantized safetensor model dir> \
38
- -o <oga model output dir> \
39
- -p int4 \
40
- -e dml
41
- ```
42
- ## Performance Measure
43
- | Prompt_len | TTFS | TPS |
44
- |------------|------|------|
45
- | 128 | 0.733 | 20.411 |
46
- | 256 | 0.873 | 20.487 |
47
- | 512 | 1.386 | 19.789 |
48
- | 1024 | 2.526 | 18.627 |
49
- | 2048 | 5.088 | 16.662 |
50
-
51
- Perplexity of hardware run: **7.261**
52
-
 
1
+ ---
2
+ license: llama2
3
+ base_model:
4
+ - meta-llama/CodeLlama-7b-Instruct-hf
5
+ ---
6
+
7
+ # amd/CodeLlama-7b-instruct-g128-hybrid
8
+ - ## Introduction
9
+ This model was prepared using the AMD Quark Quantization tool, followed by necessary post-processing.
10
+
11
+ - ## Quantization Strategy
12
+ - AWQ / Group 128 / Asymmetric / UINT4 Weights / FP16 activations
13
+ - Excluded Layers: None
14
+ -
15
+ - ## Quick Start
16
+ For quickstart, refer to [Ryzen AI doucmentation](https://ryzenai.docs.amd.com/en/latest/hybrid_oga.html)
17
+
18
+ #### Evaluation scores
19
+ The perplexity measurement is run on the wikitext-2-raw-v1 (raw data) dataset provided by Hugging Face. Perplexity score measured for prompt length 2k is 7.260.
20
+
21
+ #### License
22
+ Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.
23
+
24
+ MIT License
25
+
26
+ Copyright (c) 2024 Advanced Micro Devices, Inc
27
+
28
+ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal
29
+ in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
30
+ copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
31
+ The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
32
+
33
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
34
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
35
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
36
+ SOFTWARE.