cpatonn commited on
Commit
bbd4f6b
·
verified ·
1 Parent(s): bd65a16

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -0
README.md CHANGED
@@ -30,7 +30,24 @@ tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
30
  model.save_pretrained(output_dir, save_safetensors=True, save_compressed=False)
31
  tokenizer.save_pretrained(output_dir)
32
  ```
 
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  # gpt-oss-120b
36
  <p align="center">
 
30
  model.save_pretrained(output_dir, save_safetensors=True, save_compressed=False)
31
  tokenizer.save_pretrained(output_dir)
32
  ```
33
+ ## Inference
34
 
35
+ ### Prerequisite
36
+ Install the latest vllm version:
37
+ ```
38
+ pip install -U vllm \
39
+ --pre \
40
+ --extra-index-url https://wheels.vllm.ai/nightly
41
+ ```
42
+
43
+ ### vllm
44
+
45
+ For Ampere devices, please use TRITON_ATTN_VLLM_V1 attention backend i.e.,
46
+ ```
47
+ VLLM_ATTENTION_BACKEND=TRITON_ATTN_VLLM_V1 vllm serve cpatonn/gpt-oss-120b-BF16 --async-scheduling
48
+ ```
49
+
50
+ For further information, please visit this [guide](https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html).
51
 
52
  # gpt-oss-120b
53
  <p align="center">