Intel
/

gpt-oss-20b-int4-g64-rtn-AutoRound

4-bit precision

Model card Files Files and versions

n1ck-guo commited on 4 days ago

Commit

d3a3bd9

·

verified ·

1 Parent(s): f7ebeb5

Update README.md

Files changed (1) hide show

README.md +0 -16

README.md CHANGED Viewed

@@ -53,22 +53,6 @@ outputs = pipe(messages, max_new_tokens=512)
 print(outputs[0]["generated_text"][-1])
 ```
-### Inference with vLLM
-```bash
-uv pip install --pre vllm==0.10.1+gptoss \
-    --extra-index-url https://wheels.vllm.ai/gpt-oss/ \
-    --extra-index-url https://download.pytorch.org/whl/nightly/cu128
-vllm serve Intel/gpt-oss-20b-int4-g64-rtn-AutoRound
-```
-### Inference with Ollama
-```bash
-ollama pull Intel/gpt-oss-20b-int4-g64-rtn-AutoRound
-ollama run Intel/gpt-oss-20b-int4-g64-rtn-AutoRound
-```
-The model supports the harmony response format for consistent interaction. Ensure the appropriate format is applied when using direct model generation.[](https://huggingface.co/Intel/gpt-oss-20b-int4-AutoRound)[](https://github.com/openai/gpt-oss)
 ## Hardware Requirements
 - **Minimum**: 16GB VRAM for local inference (e.g., NVIDIA RTX 3090)

 print(outputs[0]["generated_text"][-1])
 ```
 ## Hardware Requirements
 - **Minimum**: 16GB VRAM for local inference (e.g., NVIDIA RTX 3090)