Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -270,8 +270,7 @@ print(f"Peak Memory Usage: {mem:.02f} GB")
 Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
-## benchmark_latency
 Need to install vllm nightly to get some recent changes
 ```Shell
 pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
@@ -282,7 +281,9 @@ Get vllm source code:
 git clone [email protected]:vllm-project/vllm.git
 ```
-Run the following under `vllm` root folder:
 ### baseline
 ```Shell
@@ -296,18 +297,10 @@ python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model
 ## benchmark_serving
-We also benchmarked the throughput in a serving environment.
 Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json`
 Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
-Get vllm source code:
-```Shell
-git clone [email protected]:vllm-project/vllm.git
-```
-Run the following under `vllm` root folder:
 ### baseline
 Server:
 ```Shell

 Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
+## Setup
 Need to install vllm nightly to get some recent changes
 ```Shell
 pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
 git clone [email protected]:vllm-project/vllm.git
 ```
+Run the benchmarks under `vllm` root folder:
+## benchmark_latency
 ### baseline
 ```Shell
 ## benchmark_serving
+We benchmarked the throughput in a serving environment.
 Download sharegpt dataset: `wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json`
 Other datasets can be found in: https://github.com/vllm-project/vllm/tree/main/benchmarks
 ### baseline
 Server:
 ```Shell