|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
- ar |
|
|
library_name: transformers |
|
|
tags: |
|
|
- unsloth |
|
|
- qwen |
|
|
- qwen2.5-vl |
|
|
- arabic |
|
|
- ocr |
|
|
- vision |
|
|
- text-extraction |
|
|
- merged |
|
|
- lora |
|
|
pipeline_tag: image-to-text |
|
|
base_model: unsloth/Qwen2.5-VL-7B-Instruct-bnb-4bit |
|
|
--- |
|
|
|
|
|
# ArabicOCR-Qwen2.5-VL-7B-Vision |
|
|
|
|
|
This repository contains the `float16` merged version of a Vision-Language Model (VLM), fine-tuned by **loay** for the specific task of performing Optical Character Recognition (OCR) on Arabic text from images. |
|
|
|
|
|
The model was created by fine-tuning the `unsloth/Qwen2.5-VL-7B-Instruct-bnb-4bit` model using LoRA adapters. The high-performance training was made possible by the **Unsloth** library, and the adapters were then merged back into the base model for easy deployment. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Fine-tuned by:** [loay](https://huggingface.co/loay) |
|
|
- **Base Model:** `unsloth/Qwen2.5-VL-7B-Instruct-bnb-4bit` |
|
|
- **Fine-tuning Task:** Arabic Optical Character Recognition (OCR) |
|
|
- **Training Data:** The model was trained on a curated dataset of images containing Arabic text and their corresponding transcriptions. |
|
|
- **Output Format:** This is a `float16` precision model, ideal for inference on GPUs with sufficient VRAM (requires >14GB). |