Update README.md
Browse files
README.md
CHANGED
|
@@ -270,8 +270,7 @@ print(f"Peak Memory Usage: {mem:.02f} GB")
|
|
| 270 |
|
| 271 |
Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
|
| 272 |
|
| 273 |
-
##
|
| 274 |
-
|
| 275 |
Need to install vllm nightly to get some recent changes
|
| 276 |
```Shell
|
| 277 |
pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
|
|
@@ -282,7 +281,9 @@ Get vllm source code:
|
|
| 282 |
git clone [email protected]:vllm-project/vllm.git
|
| 283 |
```
|
| 284 |
|
| 285 |
-
Run the
|
|
|
|
|
|
|
| 286 |
|
| 287 |
### baseline
|
| 288 |
```Shell
|
|
@@ -296,18 +297,10 @@ python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model
|
|
| 296 |
|
| 297 |
## benchmark_serving
|
| 298 |
|
| 299 |
-
We
|
| 300 |
|
| 301 |
Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json`
|
| 302 |
Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
|
| 303 |
-
|
| 304 |
-
Get vllm source code:
|
| 305 |
-
```Shell
|
| 306 |
-
git clone [email protected]:vllm-project/vllm.git
|
| 307 |
-
```
|
| 308 |
-
|
| 309 |
-
Run the following under `vllm` root folder:
|
| 310 |
-
|
| 311 |
### baseline
|
| 312 |
Server:
|
| 313 |
```Shell
|
|
|
|
| 270 |
|
| 271 |
Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
|
| 272 |
|
| 273 |
+
## Setup
|
|
|
|
| 274 |
Need to install vllm nightly to get some recent changes
|
| 275 |
```Shell
|
| 276 |
pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
|
|
|
|
| 281 |
git clone [email protected]:vllm-project/vllm.git
|
| 282 |
```
|
| 283 |
|
| 284 |
+
Run the benchmarks under `vllm` root folder:
|
| 285 |
+
|
| 286 |
+
## benchmark_latency
|
| 287 |
|
| 288 |
### baseline
|
| 289 |
```Shell
|
|
|
|
| 297 |
|
| 298 |
## benchmark_serving
|
| 299 |
|
| 300 |
+
We benchmarked the throughput in a serving environment.
|
| 301 |
|
| 302 |
Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json`
|
| 303 |
Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 304 |
### baseline
|
| 305 |
Server:
|
| 306 |
```Shell
|