update doc (#2)
Browse files- update doc (948e645df6573cbfa9a4d03b628e14866ab55a61)
Co-authored-by: asher <[email protected]>
README.md
CHANGED
|
@@ -168,7 +168,7 @@ docker run --privileged --user root --net=host --ipc=host \
|
|
| 168 |
--gpus=all -it --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
|
| 169 |
\
|
| 170 |
-m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 \
|
| 171 |
-
--tensor-parallel-size 2 --model tencent/Hunyuan-A13B-Instruct-GPTQ-Int4 --trust-remote-code
|
| 172 |
|
| 173 |
```
|
| 174 |
|
|
@@ -177,14 +177,17 @@ model downloaded by modelscope:
|
|
| 177 |
docker run --privileged --user root --net=host --ipc=host \
|
| 178 |
-v ~/.cache/modelscope:/root/.cache/modelscope \
|
| 179 |
--gpus=all -it --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm \
|
| 180 |
-
-m vllm.entrypoints.openai.api_server --host 0.0.0.0 --tensor-parallel-size 2 --port 8000 \
|
| 181 |
--model /root/.cache/modelscope/hub/models/Tencent-Hunyuan/Hunyuan-A13B-Instruct-GPTQ-Int4/ --trust_remote_code
|
| 182 |
```
|
| 183 |
|
|
|
|
|
|
|
|
|
|
| 184 |
|
| 185 |
### SGLang
|
| 186 |
|
| 187 |
-
Support for INT4 quantization on sglang is in progress and will be available in a future update.
|
| 188 |
|
| 189 |
## Contact Us
|
| 190 |
|
|
|
|
| 168 |
--gpus=all -it --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
|
| 169 |
\
|
| 170 |
-m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 \
|
| 171 |
+
--tensor-parallel-size 2 --quantization gptq_marlin --model tencent/Hunyuan-A13B-Instruct-GPTQ-Int4 --trust-remote-code
|
| 172 |
|
| 173 |
```
|
| 174 |
|
|
|
|
| 177 |
docker run --privileged --user root --net=host --ipc=host \
|
| 178 |
-v ~/.cache/modelscope:/root/.cache/modelscope \
|
| 179 |
--gpus=all -it --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm \
|
| 180 |
+
-m vllm.entrypoints.openai.api_server --host 0.0.0.0 --quantization gptq_marlin --tensor-parallel-size 2 --port 8000 \
|
| 181 |
--model /root/.cache/modelscope/hub/models/Tencent-Hunyuan/Hunyuan-A13B-Instruct-GPTQ-Int4/ --trust_remote_code
|
| 182 |
```
|
| 183 |
|
| 184 |
+
### TensorRT-LLM
|
| 185 |
+
|
| 186 |
+
Support for INT4 quantization on TensorRT-LLM for this model is in progress and will be available in a future update.
|
| 187 |
|
| 188 |
### SGLang
|
| 189 |
|
| 190 |
+
Support for INT4 quantization on sglang for this model is in progress and will be available in a future update.
|
| 191 |
|
| 192 |
## Contact Us
|
| 193 |
|