Alaa Aljabari commited on
Commit
bacb3bc
·
1 Parent(s): 52f11f0

Updated README

Browse files
Files changed (1) hide show
  1. README.md +58 -39
README.md CHANGED
@@ -1,64 +1,83 @@
1
- ---
2
- license: mit
3
- ---
4
- =======
5
  ---
6
  library_name: peft
7
- license: other
8
  base_model: Qwen/Qwen2.5-VL-7B-Instruct
9
  tags:
10
- - llama-factory
 
 
11
  - lora
12
- - generated_from_trainer
 
 
 
13
  model-index:
14
- - name: qwen2_5vl_arabic_model
15
  results: []
16
  ---
17
 
18
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
19
- should probably proofread and complete it, then remove this comment. -->
20
 
21
- # qwen2_5vl_arabic_model
22
 
23
- This model is a fine-tuned version of [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) on the arabic_captions dataset.
24
 
25
- ## Model description
26
 
27
- More information needed
28
 
29
- ## Intended uses & limitations
30
 
31
- More information needed
 
 
 
 
32
 
33
- ## Training and evaluation data
 
 
34
 
35
- More information needed
 
36
 
37
- ## Training procedure
 
 
38
 
39
- ### Training hyperparameters
 
 
 
 
 
 
40
 
41
- The following hyperparameters were used during training:
42
- - learning_rate: 2e-05
43
- - train_batch_size: 1
44
- - eval_batch_size: 8
45
- - seed: 42
46
- - gradient_accumulation_steps: 16
47
- - total_train_batch_size: 16
48
- - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
49
- - lr_scheduler_type: cosine
50
- - lr_scheduler_warmup_ratio: 0.1
51
- - num_epochs: 15.0
52
- - mixed_precision_training: Native AMP
53
 
54
- ### Training results
 
 
 
55
 
 
 
 
 
 
 
 
 
56
 
57
-
58
- ### Framework versions
59
-
60
  - PEFT 0.15.2
61
  - Transformers 4.49.0
62
- - Pytorch 2.4.1+cu121
63
- - Datasets 3.6.0
64
- - Tokenizers 0.21.1
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: peft
3
+ license: mit
4
  base_model: Qwen/Qwen2.5-VL-7B-Instruct
5
  tags:
6
+ - arabic
7
+ - image-captioning
8
+ - vision-language
9
  - lora
10
+ - qwen2.5-vl
11
+ - cultural-heritage
12
+ language:
13
+ - ar
14
  model-index:
15
+ - name: arabic-image-captioning-qwen2.5vl
16
  results: []
17
  ---
18
 
19
+ # Arabic Image Captioning - Qwen2.5-VL Fine-tuned
 
20
 
21
+ This model is a LoRA fine-tuned version of [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) for generating Arabic captions for images.
22
 
23
+ ## Model Description
24
 
25
+ This model was developed as part of the [Arabic Image Captioning Shared Task 2025](https://sina.birzeit.edu/image_eval2025/index.html). It generates natural Arabic captions for images with focus on historical and cultural content related to Palestinian heritage.
26
 
 
27
 
28
+ ## Usage
29
 
30
+ ```python
31
+ from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
32
+ from peft import PeftModel
33
+ import torch
34
+ from PIL import Image
35
 
36
+ # Load base model and processor
37
+ base_model = Qwen2VLForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")
38
+ processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")
39
 
40
+ # Load LoRA adapter
41
+ model = PeftModel.from_pretrained(base_model, "your-username/arabic-image-captioning-qwen2.5vl")
42
 
43
+ # Process image and generate caption
44
+ image = Image.open("your_image.jpg")
45
+ prompt = "اكتب وصفاً مختصراً لهذه الصورة باللغة العربية"
46
 
47
+ inputs = processor(images=image, text=prompt, return_tensors="pt")
48
+ with torch.no_grad():
49
+ outputs = model.generate(**inputs, max_new_tokens=128)
50
+
51
+ caption = processor.decode(outputs[0], skip_special_tokens=True)
52
+ print(caption)
53
+ ```
54
 
55
+ ## Training Details
 
 
 
 
 
 
 
 
 
 
 
56
 
57
+ ### Dataset
58
+ - **Training data**: Arabic image captions dataset from the shared task
59
+ - **Languages**: Arabic (ar)
60
+ - **Dataset size**: ~2,700 training images with Arabic captions
61
 
62
+ ### Training Procedure
63
+ - **Fine-tuning method**: LoRA (Low-Rank Adaptation)
64
+ - **Training epochs**: 15
65
+ - **Learning rate**: 2e-05
66
+ - **Batch size**: 1 with gradient accumulation (effective batch size: 16)
67
+ - **Optimizer**: AdamW with cosine learning rate scheduling
68
+ - **Hardware**: NVIDIA A100 GPU
69
+ - **Training time**: ~6 hours
70
 
71
+ ### Framework Versions
 
 
72
  - PEFT 0.15.2
73
  - Transformers 4.49.0
74
+ - PyTorch 2.4.1+cu121
75
+
76
+
77
+
78
+ ## Contact
79
+
80
+ For questions or support:
81
82
83