Update README.md
Browse files
README.md
CHANGED
|
@@ -98,7 +98,11 @@ print(f"{save_to} model:", benchmark_fn(quantized_model.generate, **inputs, max_
|
|
| 98 |
```
|
| 99 |
|
| 100 |
# Serving with vllm
|
| 101 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 102 |
```
|
| 103 |
vllm serve pytorch/Phi-4-mini-instruct-int4wo-hqq --tokenizer microsoft/Phi-4-mini-instruct -O3
|
| 104 |
```
|
|
|
|
| 98 |
```
|
| 99 |
|
| 100 |
# Serving with vllm
|
| 101 |
+
Need to install vllm nightly to get some recent changes
|
| 102 |
+
```
|
| 103 |
+
pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
|
| 104 |
+
```
|
| 105 |
+
|
| 106 |
```
|
| 107 |
vllm serve pytorch/Phi-4-mini-instruct-int4wo-hqq --tokenizer microsoft/Phi-4-mini-instruct -O3
|
| 108 |
```
|