用vllm部署后,如何用api调用呢?
#32
by
hhy150
- opened
python -m vllm.entrypoints.openai.api_server \\
--model $MODEL_PATH \\
--host 0.0.0.0 \\
--served-model-name $MODEL_NAME \\
--port 8083 \\
--gpu-memory-utilization 0.3 \\
--tensor-parallel-size 8
EOF
请问用这个如何访问呢?我用如下命令返回的是 not found
curl -X POST http://localhost:8083/embed \
-d '{"inputs": ["Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: What is the capital of China?", "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: Explain gravity"]}' \
-H "Content-Type: application/json"