cpatonn
/

gpt-oss-120b-BF16

Text Generation

Model card Files Files and versions

cpatonn commited on Aug 10

Commit

bbd4f6b

·

verified ·

1 Parent(s): bd65a16

Update README.md

Files changed (1) hide show

README.md +17 -0

README.md CHANGED Viewed

@@ -30,7 +30,24 @@ tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
 model.save_pretrained(output_dir, save_safetensors=True, save_compressed=False)
 tokenizer.save_pretrained(output_dir)
 ```
 # gpt-oss-120b
 <p align="center">

 model.save_pretrained(output_dir, save_safetensors=True, save_compressed=False)
 tokenizer.save_pretrained(output_dir)
 ```
+## Inference
+### Prerequisite
+Install the latest vllm version:
+```
+pip install -U vllm \
+    --pre \
+    --extra-index-url https://wheels.vllm.ai/nightly
+```
+### vllm
+For Ampere devices, please use TRITON_ATTN_VLLM_V1 attention backend i.e.,
+```
+VLLM_ATTENTION_BACKEND=TRITON_ATTN_VLLM_V1 vllm serve cpatonn/gpt-oss-120b-BF16 --async-scheduling
+```
+For further information, please visit this [guide](https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html).
 # gpt-oss-120b
 <p align="center">