pytorch
/

Qwen3-8B-INT4

Text Generation

text-generation-inference

Model card Files Files and versions

jerryzh168 commited on May 13

Commit

ea837c9

·

verified ·

1 Parent(s): 7a964a6

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -338,7 +338,7 @@ Note: you can change the number of prompts to be benchmarked with `--num-prompts
 Server:
 ```Shell
 export MODEL=Qwen/Qwen3-8B
-vllm serve $MODEL --tokenizer microsoft/Phi-4-mini-instruct -O3
 ```
 Client:
@@ -351,7 +351,7 @@ python benchmarks/benchmark_serving.py --backend vllm --dataset-name sharegpt --
 Server:
 ```Shell
 export MODEL=pytorch/Qwen3-8B-int4wo-hqq
-VLLM_DISABLE_COMPILE_CACHE=1 vllm serve $MODEL --tokenizer microsoft/Phi-4-mini-instruct -O3 --pt-load-map-location cuda:0
 ```
 Client:

 Server:
 ```Shell
 export MODEL=Qwen/Qwen3-8B
+vllm serve $MODEL --tokenizer $MODEL -O3
 ```
 Client:
 Server:
 ```Shell
 export MODEL=pytorch/Qwen3-8B-int4wo-hqq
+VLLM_DISABLE_COMPILE_CACHE=1 vllm serve $MODEL --tokenizer $MODEL -O3 --pt-load-map-location cuda:0
 ```
 Client: