Omarrran commited on
Commit
cc486cc
·
verified ·
1 Parent(s): 0933001

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +151 -3
README.md CHANGED
@@ -12,10 +12,158 @@ language:
12
 
13
  # Uploaded finetuned model
14
 
15
- - **Developed by:** Omarrran
16
  - **License:** apache-2.0
17
  - **Finetuned from model :** unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit
18
 
19
- This mllama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
12
 
13
  # Uploaded finetuned model
14
 
15
+ - **Developed by:** Haq Nawaz Malik
16
  - **License:** apache-2.0
17
  - **Finetuned from model :** unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit
18
 
19
+ # Documentation: Hnm_Llama3.2_(11B)-Vision_lora_model
20
+
21
+ ## Overview
22
+ The **Hnm_Llama3.2_(11B)-Vision_lora_model** is a fine-tuned version of **Llama 3.2 (11B) Vision** with **LoRA-based parameter-efficient fine-tuning (PEFT)**. It specializes in **vision-language tasks**, particularly for **medical image captioning and understanding**.
23
+
24
+ This model was fine-tuned on a **Tesla T4 (Google Colab)** using **Unsloth**, a framework designed for efficient fine-tuning of large models.
25
+
26
+ ---
27
+
28
+ ## Features
29
+ - **Fine-tuned on Radiology Images**: Trained using the **Radiology_mini** dataset.
30
+ - **Supports Image Captioning**: Can describe medical images.
31
+ - **4-bit Quantization (QLoRA)**: Memory efficient, runs on consumer GPUs.
32
+ - **LoRA-based PEFT**: Trains only **1% of parameters**, significantly reducing computational cost.
33
+ - **Multi-modal Capabilities**: Works with both **text and image** inputs.
34
+ - **Supports both Vision and Language fine-tuning**.
35
+
36
+ ---
37
+
38
+ ## Model Details
39
+ - **Base Model**: `unsloth/Llama-3.2-11B-Vision-Instruct`
40
+ - **Fine-tuning Method**: LoRA + 4-bit Quantization (QLoRA)
41
+ - **Dataset**: `unsloth/Radiology_mini`
42
+ - **Framework**: Unsloth + Hugging Face Transformers
43
+ - **Training Environment**: Google Colab (Tesla T4 GPU)
44
+
45
+ ---
46
+
47
+ ## Installation & Setup
48
+ ### 1. Install Dependencies
49
+ ```bash
50
+ pip install unsloth transformers torch datasets
51
+ ```
52
+
53
+ ### 2. Load the Model
54
+ ```python
55
+ from unsloth import FastVisionModel
56
+
57
+ model, tokenizer = FastVisionModel.from_pretrained(
58
+ "Hnm_Llama3.2_(11B)-Vision_lora_model",
59
+ load_in_4bit=True # Set to False for full precision
60
+ )
61
+ ```
62
+
63
+ ---
64
+
65
+ ## Usage
66
+ ### **1. Image Captioning Example**
67
+ ```python
68
+ import torch
69
+ from transformers import TextStreamer
70
+
71
+ FastVisionModel.for_inference(model) # Enable inference mode
72
+
73
+ # Load an image from dataset
74
+ dataset = load_dataset("unsloth/Radiology_mini", split="train")
75
+ image = dataset[0]["image"]
76
+ instruction = "Describe this medical image accurately."
77
+
78
+ messages = [
79
+ {"role": "user", "content": [
80
+ {"type": "image"},
81
+ {"type": "text", "text": instruction}
82
+ ]}
83
+ ]
84
+
85
+ input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
86
+ inputs = tokenizer(
87
+ image,
88
+ input_text,
89
+ add_special_tokens=False,
90
+ return_tensors="pt"
91
+ ).to("cuda")
92
+
93
+ text_streamer = TextStreamer(tokenizer, skip_prompt=True)
94
+ _ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=128,
95
+ use_cache=True, temperature=1.5, min_p=0.1)
96
+ ```
97
+
98
+ ### **2. Fine-Tuning on a New Dataset**
99
+ ```python
100
+ from datasets import load_dataset
101
+ from unsloth.trainer import UnslothVisionDataCollator
102
+ from trl import SFTTrainer, SFTConfig
103
+
104
+ FastVisionModel.for_training(model) # Enable training mode
105
+
106
+ dataset = load_dataset("your_custom_dataset")
107
+ data_collator = UnslothVisionDataCollator(model, tokenizer)
108
+
109
+ trainer = SFTTrainer(
110
+ model=model,
111
+ tokenizer=tokenizer,
112
+ data_collator=data_collator,
113
+ train_dataset=dataset,
114
+ args=SFTConfig(
115
+ per_device_train_batch_size=2,
116
+ gradient_accumulation_steps=4,
117
+ warmup_steps=5,
118
+ max_steps=30,
119
+ learning_rate=2e-4,
120
+ optim="adamw_8bit",
121
+ output_dir="outputs"
122
+ ),
123
+ )
124
+ trainer.train()
125
+ ```
126
+
127
+ ---
128
+
129
+ ## Deployment
130
+ ### **Save Locally**
131
+ ```python
132
+ model.save_pretrained("Hnm_Llama3.2_(11B)-Vision_lora_model")
133
+ tokenizer.save_pretrained("Hnm_Llama3.2_(11B)-Vision_lora_model")
134
+ ```
135
+
136
+ ### **Push to Hugging Face**
137
+ ```python
138
+ model.push_to_hub("your_huggingface_username/Hnm_Llama3.2_(11B)-Vision_lora_model")
139
+ tokenizer.push_to_hub("your_huggingface_username/Hnm_Llama3.2_(11B)-Vision_lora_model")
140
+ ```
141
+
142
+ ---
143
+
144
+ ## Notes
145
+ - This model is optimized for vision-language tasks in the medical field but can be adapted for other applications.
146
+ - Uses **LoRA adapters**, meaning you can fine-tune it efficiently with very few GPU resources.
147
+ - Supports **Hugging Face Model Hub** for deployment and sharing.
148
+
149
+ ---
150
+
151
+ ## Citation
152
+ If you use this model, please cite:
153
+ ```
154
+ @misc{Hnm_Llama3.2_11B_Vision,
155
+ author = {Haq Nawaz Malik},
156
+ title = {Fine-tuned Llama 3.2 (11B) Vision Model},
157
+ year = {2025},
158
+ url = {https://huggingface.co/your_huggingface_username/Hnm_Llama3.2_(11B)-Vision_lora_model}
159
+ }
160
+ ```
161
+
162
+ ---
163
+
164
+ ## Contact
165
+ For any questions or support, reach out via:
166
+ - **GitHub**: [your-github-profile](https://github.com/Haq-Nawaz-Malik)
167
+ - **Hugging Face**: [your-huggingface-profile](https://huggingface.co/Omarrran)
168
+
169