jerryzh168 commited on
Commit
c4cdc50
·
verified ·
1 Parent(s): 553f5a3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -34
README.md CHANGED
@@ -30,7 +30,7 @@ Then we can serve with the following command:
30
  ```Shell
31
  # Server
32
  export MODEL=pytorch/Qwen3-8B-int4wo-hqq
33
- vllm serve $MODEL --tokenizer $MODEL -O3
34
  ```
35
 
36
  ```Shell
@@ -47,39 +47,6 @@ curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/jso
47
  }'
48
  ```
49
 
50
-
51
- ## Code Example
52
- ```Py
53
- from vllm import LLM, SamplingParams
54
-
55
- # Sample prompts.
56
- prompts = [
57
- "Hello, my name is",
58
- "The president of the United States is",
59
- "The capital of France is",
60
- "The future of AI is",
61
- ]
62
- # Create a sampling params object.
63
- sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
64
-
65
-
66
- if __name__ == '__main__':
67
- # Create an LLM.
68
- llm = LLM(model="pytorch/Qwen3-8B-int4wo-hqq")
69
- # Generate texts from the prompts.
70
- # The output is a list of RequestOutput objects
71
- # that contain the prompt, generated text, and other information.
72
- outputs = llm.generate(prompts, sampling_params)
73
- # Print the outputs.
74
- print("\nGenerated Outputs:\n" + "-" * 60)
75
- for output in outputs:
76
- prompt = output.prompt
77
- generated_text = output.outputs[0].text
78
- print(f"Prompt: {prompt!r}")
79
- print(f"Output: {generated_text!r}")
80
- print("-" * 60)
81
- ```
82
-
83
  Note: please use `VLLM_DISABLE_COMPILE_CACHE=1` to disable compile cache when running this code, e.g. `VLLM_DISABLE_COMPILE_CACHE=1 python example.py`, since there are some issues with the composability of compile in vLLM and torchao,
84
  this is expected be resolved in pytorch 2.8.
85
 
 
30
  ```Shell
31
  # Server
32
  export MODEL=pytorch/Qwen3-8B-int4wo-hqq
33
+ VLLM_DISABLE_COMPILE_CACHE=1 vllm serve $MODEL --tokenizer $MODEL -O3
34
  ```
35
 
36
  ```Shell
 
47
  }'
48
  ```
49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
  Note: please use `VLLM_DISABLE_COMPILE_CACHE=1` to disable compile cache when running this code, e.g. `VLLM_DISABLE_COMPILE_CACHE=1 python example.py`, since there are some issues with the composability of compile in vLLM and torchao,
51
  this is expected be resolved in pytorch 2.8.
52