Update README.md
Browse files
README.md
CHANGED
|
@@ -19,6 +19,16 @@ tags:
|
|
| 19 |
|
| 20 |
The **Qwen2-VL-OCR-2B-Instruct** model is a fine-tuned version of **Qwen/Qwen2-VL-2B-Instruct**, tailored for tasks that involve **Optical Character Recognition (OCR)**, **image-to-text conversion**, and **math problem solving with LaTeX formatting**. This model integrates a conversational approach with visual and textual understanding to handle multi-modal tasks effectively.
|
| 21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
| **File Name** | **Size** | **Description** | **Upload Status** |
|
| 23 |
|---------------------------|------------|------------------------------------------------|-------------------|
|
| 24 |
| `.gitattributes` | 1.52 kB | Configures LFS tracking for specific model files. | Initial commit |
|
|
|
|
| 19 |
|
| 20 |
The **Qwen2-VL-OCR-2B-Instruct** model is a fine-tuned version of **Qwen/Qwen2-VL-2B-Instruct**, tailored for tasks that involve **Optical Character Recognition (OCR)**, **image-to-text conversion**, and **math problem solving with LaTeX formatting**. This model integrates a conversational approach with visual and textual understanding to handle multi-modal tasks effectively.
|
| 21 |
|
| 22 |
+
#### Key Enhancements:
|
| 23 |
+
|
| 24 |
+
* **SoTA understanding of images of various resolution & ratio**: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.
|
| 25 |
+
|
| 26 |
+
* **Understanding videos of 20min+**: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.
|
| 27 |
+
|
| 28 |
+
* **Agent that can operate your mobiles, robots, etc.**: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions.
|
| 29 |
+
|
| 30 |
+
* **Multilingual Support**: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.
|
| 31 |
+
|
| 32 |
| **File Name** | **Size** | **Description** | **Upload Status** |
|
| 33 |
|---------------------------|------------|------------------------------------------------|-------------------|
|
| 34 |
| `.gitattributes` | 1.52 kB | Configures LFS tracking for specific model files. | Initial commit |
|