Image-Text-to-Text
Transformers
Safetensors
English
qwen2_5_vl
multimodal
conversational
Eval Results
text-generation-inference
Instructions to use Qwen/Qwen2.5-VL-7B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Qwen/Qwen2.5-VL-7B-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Qwen/Qwen2.5-VL-7B-Instruct") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct") model = AutoModelForImageTextToText.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Qwen/Qwen2.5-VL-7B-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Qwen/Qwen2.5-VL-7B-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen2.5-VL-7B-Instruct", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Qwen/Qwen2.5-VL-7B-Instruct
- SGLang
How to use Qwen/Qwen2.5-VL-7B-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Qwen/Qwen2.5-VL-7B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen2.5-VL-7B-Instruct", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Qwen/Qwen2.5-VL-7B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen2.5-VL-7B-Instruct", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use Qwen/Qwen2.5-VL-7B-Instruct with Docker Model Runner:
docker model run hf.co/Qwen/Qwen2.5-VL-7B-Instruct
Add MMMU-Pro evaluation result
#70 opened 8 days ago
by
SaylorTwift
Qwen2.57Binit
#69 opened 2 months ago
by
miaohuairui
Add ScreenSpot-Pro evaluation result (Qwen2.5-VL-7B-Instruct)
#67 opened 2 months ago
by
merve
Add missing Apache-2.0 license (aligned with other Qwen model repos)
#66 opened 3 months ago
by
angel-al
Report Model not working via Hugging Face Router (50001 Internal Error)
1
#65 opened 4 months ago
by
Mustafahmdan
The model suddenly is not working
6
#64 opened 4 months ago
by
Mustafahmdan
Token Count Calculation in SFT Data Distribution Curation
#63 opened 6 months ago
by
tcy006
Transformers Version Error
#62 opened 6 months ago
by
thyjeff
Optimal image format and size
#60 opened 7 months ago
by
yiftachd
Why tie_word_embeddings is False for 7B model?
#58 opened 10 months ago
by
pengzhenghao97
Update README.md
#56 opened 10 months ago
by
stevezkw
Fix Video Inference - TypeError: process_vision_info() got an unexpected keyword argument 'return_video_kwargs'
#55 opened 10 months ago
by
developer0hye
Batch inference gives addCriterion
2
#54 opened 11 months ago
by
clyao-123
Output logits differ significantly for differenet attn_implementations on image inputs
👍 2
1
#53 opened 11 months ago
by
peefcat
Confusion with the parameter fixed in visual ViT merger in source code.
#52 opened 11 months ago
by
Bytes-Lin
Update README.md
#51 opened 11 months ago
by
xiaowei4ai
In-context learning
#50 opened 12 months ago
by
Gilad-Deutch
"ValueError: Image features and image tokens do not match: tokens: 0, features 2852"
1
#49 opened 12 months ago
by
jdkruzr
Context Length
#48 opened about 1 year ago
by
Fujiao
Model cannot be downloaded
👍 18
3
#46 opened about 1 year ago
by
sadabshiper
support function calling?
#44 opened about 1 year ago
by
tasia
Video Inference --> RuntimeError: torch.cat(): expected a non-empty list of Tensors
#43 opened about 1 year ago
by
caput
Inference on Tesla P40*2 or RTX 2080Ti *2, after fixing a bug(?)
#42 opened about 1 year ago
by
luweigen
Update README.md
#40 opened about 1 year ago
by
megladagon
Chat template does not work
😔 7
#39 opened about 1 year ago
by
tonydavis629
Poor performance with simple table extraction task
3
#38 opened about 1 year ago
by
hanshupe
Base model for Qwen2.5-VL-7B-Instruct
1
#37 opened about 1 year ago
by
zzlynxSG
Request: DOI
1
#36 opened about 1 year ago
by
mojan3
pixel_values to RGB image
#33 opened over 1 year ago
by
ococq
Update README.md
1
#32 opened over 1 year ago
by
teowu
Qwen2.5-VL-7B-Instruct License?
1
#31 opened over 1 year ago
by
radames
Add link to paper page
#30 opened over 1 year ago
by
nielsr
For now, use this version of Transformers for vLLM.
3
#29 opened over 1 year ago
by
mkvn
Fine-tuning for Image Captioning and Image QA with Segmented Images
👍 1
#28 opened over 1 year ago
by
badhon1512
Model architectures ['Qwen2_5_VLForConditionalGeneration'] are not supported for now.
#27 opened over 1 year ago
by
liujh123
No module named 'transformers.models.qwen2_5_vl.image_processing_qwen2_5_vl'
1
#26 opened over 1 year ago
by
risonlovesakura
XXX
#25 opened over 1 year ago
by
mpasternak
finetune
🔥👍 12
#23 opened over 1 year ago
by
andrewdeeplearning
[Error] (Qwen2.5-VL-instruct-7B) Qwen2TokenizerFast object has no attribute 'tokenizer'
4
#21 opened over 1 year ago
by
yus002
Vision tokens missing from chat template
#20 opened over 1 year ago
by
depasquale
Batch Inference会报错
4
#19 opened over 1 year ago
by
dingguofeng
Hardware and vram requiremnt to run this model?
9
#18 opened over 1 year ago
by
IronmanSnap
Request: DOI
#16 opened over 1 year ago
by
arashinokage
Can we use tools with this model?
👍 1
#15 opened over 1 year ago
by
barleyspectacular
Bounding boxes coordinates
6
#13 opened over 1 year ago
by
ljoana
What engine can I use to deploy this model?
🔥➕ 10
3
#12 opened over 1 year ago
by
jjovalle99
Finetuning scripts for Qwen2.5-VL (No llama-factory)
👍 16
#11 opened over 1 year ago
by
2U1