pytorch
/

Qwen3-32B-FP8

Text Generation

text-generation-inference

Model card Files Files and versions

SocialLocalMobile commited on May 7

Commit

1f0b7e0

·

verified ·

1 Parent(s): 09188dc

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -14,10 +14,12 @@ pipeline_tag: text-generation
 # 1. Inference with vLLM
 ```Shell
 VLLM_DISABLE_COMPILE_CACHE=1 vllm serve SocialLocalMobile/Qwen3-32B-float8dq --tokenizer Qwen/Qwen3-32B -O3
 ```
 ```Shell
 curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
   "model": "SocialLocalMobile/Qwen3-32B-float8dq",
   "messages": [

 # 1. Inference with vLLM
 ```Shell
+# Server
 VLLM_DISABLE_COMPILE_CACHE=1 vllm serve SocialLocalMobile/Qwen3-32B-float8dq --tokenizer Qwen/Qwen3-32B -O3
 ```
 ```Shell
+# Client
 curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
   "model": "SocialLocalMobile/Qwen3-32B-float8dq",
   "messages": [