Update README.md
Browse files
README.md
CHANGED
@@ -30,7 +30,7 @@ Then we can serve with the following command:
|
|
30 |
```Shell
|
31 |
# Server
|
32 |
export MODEL=pytorch/Qwen3-8B-int4wo-hqq
|
33 |
-
vllm serve $MODEL --tokenizer $MODEL -O3
|
34 |
```
|
35 |
|
36 |
```Shell
|
@@ -47,39 +47,6 @@ curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/jso
|
|
47 |
}'
|
48 |
```
|
49 |
|
50 |
-
|
51 |
-
## Code Example
|
52 |
-
```Py
|
53 |
-
from vllm import LLM, SamplingParams
|
54 |
-
|
55 |
-
# Sample prompts.
|
56 |
-
prompts = [
|
57 |
-
"Hello, my name is",
|
58 |
-
"The president of the United States is",
|
59 |
-
"The capital of France is",
|
60 |
-
"The future of AI is",
|
61 |
-
]
|
62 |
-
# Create a sampling params object.
|
63 |
-
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
|
64 |
-
|
65 |
-
|
66 |
-
if __name__ == '__main__':
|
67 |
-
# Create an LLM.
|
68 |
-
llm = LLM(model="pytorch/Qwen3-8B-int4wo-hqq")
|
69 |
-
# Generate texts from the prompts.
|
70 |
-
# The output is a list of RequestOutput objects
|
71 |
-
# that contain the prompt, generated text, and other information.
|
72 |
-
outputs = llm.generate(prompts, sampling_params)
|
73 |
-
# Print the outputs.
|
74 |
-
print("\nGenerated Outputs:\n" + "-" * 60)
|
75 |
-
for output in outputs:
|
76 |
-
prompt = output.prompt
|
77 |
-
generated_text = output.outputs[0].text
|
78 |
-
print(f"Prompt: {prompt!r}")
|
79 |
-
print(f"Output: {generated_text!r}")
|
80 |
-
print("-" * 60)
|
81 |
-
```
|
82 |
-
|
83 |
Note: please use `VLLM_DISABLE_COMPILE_CACHE=1` to disable compile cache when running this code, e.g. `VLLM_DISABLE_COMPILE_CACHE=1 python example.py`, since there are some issues with the composability of compile in vLLM and torchao,
|
84 |
this is expected be resolved in pytorch 2.8.
|
85 |
|
|
|
30 |
```Shell
|
31 |
# Server
|
32 |
export MODEL=pytorch/Qwen3-8B-int4wo-hqq
|
33 |
+
VLLM_DISABLE_COMPILE_CACHE=1 vllm serve $MODEL --tokenizer $MODEL -O3
|
34 |
```
|
35 |
|
36 |
```Shell
|
|
|
47 |
}'
|
48 |
```
|
49 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
50 |
Note: please use `VLLM_DISABLE_COMPILE_CACHE=1` to disable compile cache when running this code, e.g. `VLLM_DISABLE_COMPILE_CACHE=1 python example.py`, since there are some issues with the composability of compile in vLLM and torchao,
|
51 |
this is expected be resolved in pytorch 2.8.
|
52 |
|