cpatonn
/

Seed-OSS-36B-Instruct-AWQ-8bit

Text Generation

compressed-tensors

Model card Files Files and versions

cpatonn commited on 2 days ago

Commit

080df73

·

verified ·

1 Parent(s): a56bfb2

Update README.md

Files changed (1) hide show

README.md +18 -0

README.md CHANGED Viewed

@@ -10,6 +10,24 @@ language:
 base_model:
 - ByteDance-Seed/Seed-OSS-36B-Instruct
 ---
 <div align="center">
  👋 Hi, everyone!

 base_model:
 - ByteDance-Seed/Seed-OSS-36B-Instruct
 ---
+# Seed-OSS-36B-Instruct-AWQ-8bit
+## Method
+[vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor.git) and [nvidia/Llama-Nemotron-Post-Training-Dataset](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset) were used to quantize the original model. For further quantization arguments and configurations information, please visit [config.json](https://huggingface.co/cpatonn/Seed-OSS-36B-Instruct-AWQ-8bit/blob/main/config.json) and [recipe.yaml](https://huggingface.co/cpatonn/Seed-OSS-36B-Instruct-AWQ-8bit/blob/main/recipe.yaml).
+## Inference
+### Prerequisite
+To have ```SeedOssForCausalLM``` implementations, please install transformers from source:
+```
+git clone https://github.com/huggingface/transformers.git
+cd transformers
+pip install .[torch]
+```
+### vllm
+```
+vllm serve cpatonn/Seed-OSS-36B-Instruct-AWQ-8bit --tensor-parallel-size 4
+```
 <div align="center">
  👋 Hi, everyone!