prithivMLmods
/

Qwen2-VL-OCR-2B-Instruct

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

prithivMLmods commited on Dec 19, 2024

Commit

14e4776

·

verified ·

1 Parent(s): b249d4b

Update README.md

Files changed (1) hide show

README.md +10 -0

README.md CHANGED Viewed

@@ -19,6 +19,16 @@ tags:
 The **Qwen2-VL-OCR-2B-Instruct** model is a fine-tuned version of **Qwen/Qwen2-VL-2B-Instruct**, tailored for tasks that involve **Optical Character Recognition (OCR)**, **image-to-text conversion**, and **math problem solving with LaTeX formatting**. This model integrates a conversational approach with visual and textual understanding to handle multi-modal tasks effectively.
 | **File Name**             | **Size**   | **Description**                                 | **Upload Status** |
 |---------------------------|------------|------------------------------------------------|-------------------|
 | `.gitattributes`          | 1.52 kB   | Configures LFS tracking for specific model files. | Initial commit    |

 The **Qwen2-VL-OCR-2B-Instruct** model is a fine-tuned version of **Qwen/Qwen2-VL-2B-Instruct**, tailored for tasks that involve **Optical Character Recognition (OCR)**, **image-to-text conversion**, and **math problem solving with LaTeX formatting**. This model integrates a conversational approach with visual and textual understanding to handle multi-modal tasks effectively.
+#### Key Enhancements:
+* **SoTA understanding of images of various resolution & ratio**: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.
+* **Understanding videos of 20min+**: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.
+* **Agent that can operate your mobiles, robots, etc.**: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions.
+* **Multilingual Support**: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.
 | **File Name**             | **Size**   | **Description**                                 | **Upload Status** |
 |---------------------------|------------|------------------------------------------------|-------------------|
 | `.gitattributes`          | 1.52 kB   | Configures LFS tracking for specific model files. | Initial commit    |