Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -98,7 +98,11 @@ print(f"{save_to} model:", benchmark_fn(quantized_model.generate, **inputs, max_
 ```
 # Serving with vllm
-We can use the same command we used in serving benchmarks to serve the model with vllm
 ```
 vllm serve pytorch/Phi-4-mini-instruct-int4wo-hqq --tokenizer microsoft/Phi-4-mini-instruct -O3
 ```

 ```
 # Serving with vllm
+Need to install vllm nightly to get some recent changes
+```
+pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
+```
 ```
 vllm serve pytorch/Phi-4-mini-instruct-int4wo-hqq --tokenizer microsoft/Phi-4-mini-instruct -O3
 ```