metadata
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
This is an example model demonstrating how to run the AutoRound format for a visual language model on vLLM. Some visual modules have been quantized to 8-bit precision.
this pr https://github.com/vllm-project/vllm/pull/21802 is required.
vllm serve Intel/Qwen2.5-VL-7B-Instruct-int4-mixed-AutoRound --dtype bfloat16 --max-model-len 10000
curl --noproxy '*' http://localhost:8001/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "Intel/Qwen2.5-VL-7B-Instruct-int4-mixed-AutoRound",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
}
},
{
"type": "text",
"text": "请描述这张图"
}
]
}
],
"max_tokens": 512
}'