Update README.md
Browse files
README.md
CHANGED
|
@@ -113,7 +113,7 @@ lm_eval --model hf --model_args pretrained=microsoft/Phi-4-mini-instruct --tasks
|
|
| 113 |
|
| 114 |
## float8dq
|
| 115 |
```
|
| 116 |
-
lm_eval --model hf --model_args pretrained=
|
| 117 |
```
|
| 118 |
|
| 119 |
`TODO: more complete eval results`
|
|
@@ -163,7 +163,7 @@ python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model
|
|
| 163 |
|
| 164 |
### float8dq
|
| 165 |
```
|
| 166 |
-
python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model
|
| 167 |
```
|
| 168 |
|
| 169 |
## benchmark_serving
|
|
@@ -186,7 +186,7 @@ python benchmarks/benchmark_serving.py --backend vllm --dataset-name sharegpt --
|
|
| 186 |
### float8dq
|
| 187 |
Server:
|
| 188 |
```
|
| 189 |
-
vllm serve
|
| 190 |
```
|
| 191 |
|
| 192 |
Client:
|
|
@@ -197,5 +197,5 @@ python benchmarks/benchmark_serving.py --backend vllm --dataset-name sharegpt --
|
|
| 197 |
# Serving with vllm
|
| 198 |
We can use the same command we used in serving benchmarks to serve the model with vllm
|
| 199 |
```
|
| 200 |
-
vllm serve
|
| 201 |
```
|
|
|
|
| 113 |
|
| 114 |
## float8dq
|
| 115 |
```
|
| 116 |
+
lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-float8dq --tasks hellaswag --device cuda:0 --batch_size 8
|
| 117 |
```
|
| 118 |
|
| 119 |
`TODO: more complete eval results`
|
|
|
|
| 163 |
|
| 164 |
### float8dq
|
| 165 |
```
|
| 166 |
+
python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model pytorch/Phi-4-mini-instruct-float8dq --batch-size 1
|
| 167 |
```
|
| 168 |
|
| 169 |
## benchmark_serving
|
|
|
|
| 186 |
### float8dq
|
| 187 |
Server:
|
| 188 |
```
|
| 189 |
+
vllm serve pytorch/Phi-4-mini-instruct-float8dq --tokenizer microsoft/Phi-4-mini-instruct -O3
|
| 190 |
```
|
| 191 |
|
| 192 |
Client:
|
|
|
|
| 197 |
# Serving with vllm
|
| 198 |
We can use the same command we used in serving benchmarks to serve the model with vllm
|
| 199 |
```
|
| 200 |
+
vllm serve pytorch/Phi-4-mini-instruct-float8dq --tokenizer microsoft/Phi-4-mini-instruct -O3
|
| 201 |
```
|