Vision - a melvindave Collection

melvindave 's Collections

Vision

Papers

Language Models (Reasoning)

Audio Transcription

Image Generation

Fine-tuning Models

Coding

Customer Conversations Datasets

Vision

updated 22 days ago

Running on CPU Upgrade

954

Open VLM Leaderboard

🌎

954

VLMEvalKit Evaluation Results Collection
Running on Zero

Featured

315

DeepSeek OCR Demo

🚀

315

Try out DeepSeek-OCR on your PDFs or images
Running on Zero

MCP

53

Multimodal OCR3

🌖

53

nanonets2-ocr / chandra-ocr / dots.ocr / olm-ocr2
Qwen/Qwen3-VL-30B-A3B-Instruct

Image-Text-to-Text • 31B • Updated Nov 26, 2025 • 1.32M • • 479

Note running locally in lmstudio
Qwen/Qwen3-VL-235B-A22B-Thinking

Image-Text-to-Text • 236B • Updated Nov 26, 2025 • 38k • • 357

Note inference available
Qwen/Qwen3-VL-235B-A22B-Instruct

Image-Text-to-Text • 236B • Updated Nov 26, 2025 • 230k • • 348

Note inference available
Qwen/Qwen2.5-VL-7B-Instruct

Image-Text-to-Text • 8B • Updated Apr 6, 2025 • 2.44M • • 1.41k
zai-org/GLM-4.6V

Image-Text-to-Text • 108B • Updated 24 days ago • 185k • • 353
Running on Zero

Featured

111

VLM Object Understanding

🦀

111

Explore object detection, visual grounding, keypoint Detecti