Update README.md
Browse files
README.md
CHANGED
@@ -282,10 +282,8 @@ Our int4wo is only optimized for batch size 1, so expect some slowdown with larg
|
|
282 |
| Benchmark (Latency) | | |
|
283 |
|----------------------------------|----------------|--------------------------|
|
284 |
| | Qwen3-8B | Qwen3-8B-int4wo-hqq |
|
285 |
-
| latency (batch_size=1) | 3.52s | 2.84s (24% speedup)
|
286 |
-
| serving (num_prompts=1) | 0.64 req/s | 0.79 req/s (23% speedup) |
|
287 |
|
288 |
-
Note the result of latency (benchmark_latency) is in seconds, and serving (benchmark_serving) is in number of requests per second.
|
289 |
Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
|
290 |
|
291 |
|
|
|
282 |
| Benchmark (Latency) | | |
|
283 |
|----------------------------------|----------------|--------------------------|
|
284 |
| | Qwen3-8B | Qwen3-8B-int4wo-hqq |
|
285 |
+
| latency (batch_size=1) | 3.52s | 2.84s (24% speedup) |
|
|
|
286 |
|
|
|
287 |
Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
|
288 |
|
289 |
|