n1ck-guo commited on
Commit
d3a3bd9
·
verified ·
1 Parent(s): f7ebeb5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -16
README.md CHANGED
@@ -53,22 +53,6 @@ outputs = pipe(messages, max_new_tokens=512)
53
  print(outputs[0]["generated_text"][-1])
54
  ```
55
 
56
- ### Inference with vLLM
57
- ```bash
58
- uv pip install --pre vllm==0.10.1+gptoss \
59
- --extra-index-url https://wheels.vllm.ai/gpt-oss/ \
60
- --extra-index-url https://download.pytorch.org/whl/nightly/cu128
61
- vllm serve Intel/gpt-oss-20b-int4-g64-rtn-AutoRound
62
- ```
63
-
64
- ### Inference with Ollama
65
- ```bash
66
- ollama pull Intel/gpt-oss-20b-int4-g64-rtn-AutoRound
67
- ollama run Intel/gpt-oss-20b-int4-g64-rtn-AutoRound
68
- ```
69
-
70
- The model supports the harmony response format for consistent interaction. Ensure the appropriate format is applied when using direct model generation.[](https://huggingface.co/Intel/gpt-oss-20b-int4-AutoRound)[](https://github.com/openai/gpt-oss)
71
-
72
  ## Hardware Requirements
73
 
74
  - **Minimum**: 16GB VRAM for local inference (e.g., NVIDIA RTX 3090)
 
53
  print(outputs[0]["generated_text"][-1])
54
  ```
55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  ## Hardware Requirements
57
 
58
  - **Minimum**: 16GB VRAM for local inference (e.g., NVIDIA RTX 3090)