Alaa Aljabari
added training dataset link
0b7d988
---
library_name: peft
license: mit
base_model: Qwen/Qwen2.5-VL-7B-Instruct
datasets:
- SinaLab/ImageEval2025Task2TrainDataset
tags:
- arabic
- image-captioning
- vision-language
- lora
- qwen2.5-vl
- cultural-heritage
language:
- ar
model-index:
- name: arabic-image-captioning-qwen2.5vl
results: []
---
# Arabic Image Captioning - Qwen2.5-VL Fine-tuned
This model is a LoRA fine-tuned version of [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) for generating Arabic captions for images.
## Model Description
This model was developed as part of the [Arabic Image Captioning Shared Task 2025](https://sina.birzeit.edu/image_eval2025/index.html). It generates natural Arabic captions for images with focus on historical and cultural content related to Palestinian heritage.
please refer to the [training dataset](https://huggingface.co/datasets/SinaLab/ImageEval2025Task2TrainDataset) for more details.
## Usage
```python
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
import torch
from PIL import Image
# Load base model and processor
base_model = Qwen2VLForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "your-username/arabic-image-captioning-qwen2.5vl")
# Process image and generate caption
image = Image.open("your_image.jpg")
prompt = "اكتب وصفاً مختصراً لهذه الصورة باللغة العربية"
inputs = processor(images=image, text=prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=128)
caption = processor.decode(outputs[0], skip_special_tokens=True)
print(caption)
```
## Training Details
### Dataset
- **Training data**: Arabic image captions dataset from the shared task
- **Languages**: Arabic (ar)
- **Dataset size**: ~2,700 training images with Arabic captions
### Training Procedure
- **Fine-tuning method**: LoRA (Low-Rank Adaptation)
- **Training epochs**: 15
- **Learning rate**: 2e-05
- **Batch size**: 1 with gradient accumulation (effective batch size: 16)
- **Optimizer**: AdamW with cosine learning rate scheduling
- **Hardware**: NVIDIA A100 GPU
- **Training time**: ~6 hours
### Framework Versions
- PEFT 0.15.2
- Transformers 4.49.0
- PyTorch 2.4.1+cu121
## Contact
For questions or support:
- [email protected]
- [email protected]
- [email protected]