amd
/

CodeLlama-7b-instruct-awq-asym-uint4-g128-lmhead-onnx-hybrid

Model card Files Files and versions

satreysa commited on Mar 17

Commit

a1844d2

·

verified ·

1 Parent(s): 5374ada

Readme update

Files changed (1) hide show

README.md +36 -52

README.md CHANGED Viewed

@@ -1,52 +1,36 @@
----
-license: mit
-language:
-  - en
-base_model:
-  - meta-llama/CodeLlama-7b-Instruct-hf
-pipeline_tag: text-generation
----
-Quark Version used - https://gitenterprise.xilinx.com/AMDNeuralOpt/Quark/tree/v0.8rc3
-Quark Command:
-```
-python3 quantize_quark.py  \
-      --model_dir models/CodeLlama-7b-Instruct-hf \
-      --output_dir quantized_safetensor/CodeLlama-7b-Instruct-gs128-AWQ-Quant  \
-      --quant_scheme w_uint4_per_group_asym  \
-      --num_calib_data 128  \
-      --quant_algo awq  \
-      --dataset pileval_for_awq_benchmark  \
-      --model_export hf_format   \
-      --group_size 128 \
-      --data_type float32   \
-      --exclude_layers
-```
-Perplexity after quantization: 7.181363105773926
-OGA Model builder: https://gitenterprise.xilinx.com/AMDNeuralOpt/onnxruntime-genai/tree/model_builder_rai_1.4
-OGA Model builder command:
-```
-python builder.py \
-   -i <quantized safetensor model dir> \
-   -o <oga model output dir> \
-   -p int4 \
-   -e dml
-```
-## Performance Measure
-| Prompt_len | TTFS  | TPS   |
-|------------|------|------|
-| 128        | 0.733  | 20.411  |
-| 256        | 0.873  | 20.487  |
-| 512        | 1.386  | 19.789  |
-| 1024       | 2.526  | 18.627  |
-| 2048       | 5.088  | 16.662  |
-Perplexity of hardware run: **7.261**

+---
+license: llama2
+base_model:
+  - meta-llama/CodeLlama-7b-Instruct-hf
+---
+# amd/CodeLlama-7b-instruct-g128-hybrid
+- ## Introduction
+  This model was prepared using the AMD Quark Quantization tool, followed by necessary post-processing.
+- ## Quantization Strategy
+  - AWQ / Group 128 / Asymmetric / UINT4 Weights / FP16 activations
+  - Excluded Layers: None
+-
+- ## Quick Start
+For quickstart, refer to [Ryzen AI doucmentation](https://ryzenai.docs.amd.com/en/latest/hybrid_oga.html)
+#### Evaluation scores
+The perplexity measurement is run on the wikitext-2-raw-v1 (raw data) dataset provided by Hugging Face. Perplexity score measured for prompt length 2k is 7.260.
+#### License
+Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.
+MIT License
+Copyright (c) 2024 Advanced Micro Devices, Inc
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.