File size: 1,006 Bytes

---
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
---
This is an example model demonstrating how to run the AutoRound format for a visual language model on vLLM. Some visual modules have been quantized to 8-bit precision.

this pr https://github.com/vllm-project/vllm/pull/21802 is required.


 ~~~bash
 vllm serve Intel/Qwen2.5-VL-7B-Instruct-int4-mixed-AutoRound  --dtype bfloat16   --max-model-len 10000
~~~

~~~bash
curl --noproxy '*'   http://localhost:8001/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "Intel/Qwen2.5-VL-7B-Instruct-int4-mixed-AutoRound",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "image_url",
            "image_url": {
              "url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
            }
          },
          {
            "type": "text",
            "text": "请描述这张图"
          }
        ]
      }
    ],
    "max_tokens": 512
  }'

~~~