cpatonn commited on
Commit
080df73
·
verified ·
1 Parent(s): a56bfb2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md CHANGED
@@ -10,6 +10,24 @@ language:
10
  base_model:
11
  - ByteDance-Seed/Seed-OSS-36B-Instruct
12
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  <div align="center">
15
  👋 Hi, everyone!
 
10
  base_model:
11
  - ByteDance-Seed/Seed-OSS-36B-Instruct
12
  ---
13
+ # Seed-OSS-36B-Instruct-AWQ-8bit
14
+
15
+ ## Method
16
+ [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor.git) and [nvidia/Llama-Nemotron-Post-Training-Dataset](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset) were used to quantize the original model. For further quantization arguments and configurations information, please visit [config.json](https://huggingface.co/cpatonn/Seed-OSS-36B-Instruct-AWQ-8bit/blob/main/config.json) and [recipe.yaml](https://huggingface.co/cpatonn/Seed-OSS-36B-Instruct-AWQ-8bit/blob/main/recipe.yaml).
17
+
18
+ ## Inference
19
+
20
+ ### Prerequisite
21
+ To have ```SeedOssForCausalLM``` implementations, please install transformers from source:
22
+ ```
23
+ git clone https://github.com/huggingface/transformers.git
24
+ cd transformers
25
+ pip install .[torch]
26
+ ```
27
+ ### vllm
28
+ ```
29
+ vllm serve cpatonn/Seed-OSS-36B-Instruct-AWQ-8bit --tensor-parallel-size 4
30
+ ```
31
 
32
  <div align="center">
33
  👋 Hi, everyone!