
HuggingFaceTB/SmolVLM-Instruct
Image-Text-to-Text
•
2B
•
Updated
•
71.3k
•
535
Generate answers by combining text and images
A community project to create an image preferences dataset.
Generate clickable coordinates on a screenshot