用vllm部署后,如何用api调用呢?

#32
by hhy150 - opened
python -m vllm.entrypoints.openai.api_server \\
    --model $MODEL_PATH \\
    --host 0.0.0.0 \\
    --served-model-name $MODEL_NAME \\
    --port 8083 \\
    --gpu-memory-utilization 0.3 \\
    --tensor-parallel-size 8
EOF

请问用这个如何访问呢?我用如下命令返回的是 not found

curl -X POST http://localhost:8083/embed \
    -d '{"inputs": ["Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: What is the capital of China?", "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: Explain gravity"]}' \
    -H "Content-Type: application/json"

Sign up or log in to comment