Omarrran
/

Hnm_Llama3_2_Vision_lora_model

@@ -12,10 +12,158 @@ language:
 # Uploaded finetuned  model
-- **Developed by:** Omarrran
 - **License:** apache-2.0
 - **Finetuned from model :** unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit
-This mllama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 # Uploaded finetuned  model
+- **Developed by:** Haq Nawaz Malik
 - **License:** apache-2.0
 - **Finetuned from model :** unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit
+# Documentation: Hnm_Llama3.2_(11B)-Vision_lora_model
+## Overview
+The **Hnm_Llama3.2_(11B)-Vision_lora_model** is a fine-tuned version of **Llama 3.2 (11B) Vision** with **LoRA-based parameter-efficient fine-tuning (PEFT)**. It specializes in **vision-language tasks**, particularly for **medical image captioning and understanding**.
+This model was fine-tuned on a **Tesla T4 (Google Colab)** using **Unsloth**, a framework designed for efficient fine-tuning of large models.
+---
+## Features
+- **Fine-tuned on Radiology Images**: Trained using the **Radiology_mini** dataset.
+- **Supports Image Captioning**: Can describe medical images.
+- **4-bit Quantization (QLoRA)**: Memory efficient, runs on consumer GPUs.
+- **LoRA-based PEFT**: Trains only **1% of parameters**, significantly reducing computational cost.
+- **Multi-modal Capabilities**: Works with both **text and image** inputs.
+- **Supports both Vision and Language fine-tuning**.
+---
+## Model Details
+- **Base Model**: `unsloth/Llama-3.2-11B-Vision-Instruct`
+- **Fine-tuning Method**: LoRA + 4-bit Quantization (QLoRA)
+- **Dataset**: `unsloth/Radiology_mini`
+- **Framework**: Unsloth + Hugging Face Transformers
+- **Training Environment**: Google Colab (Tesla T4 GPU)
+---
+## Installation & Setup
+### 1. Install Dependencies
+```bash
+pip install unsloth transformers torch datasets
+```
+### 2. Load the Model
+```python
+from unsloth import FastVisionModel
+model, tokenizer = FastVisionModel.from_pretrained(
+    "Hnm_Llama3.2_(11B)-Vision_lora_model",
+    load_in_4bit=True  # Set to False for full precision
+)
+```
+---
+## Usage
+### **1. Image Captioning Example**
+```python
+import torch
+from transformers import TextStreamer
+FastVisionModel.for_inference(model)  # Enable inference mode
+# Load an image from dataset
+dataset = load_dataset("unsloth/Radiology_mini", split="train")
+image = dataset[0]["image"]
+instruction = "Describe this medical image accurately."
+messages = [
+    {"role": "user", "content": [
+        {"type": "image"},
+        {"type": "text", "text": instruction}
+    ]}
+]
+input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
+inputs = tokenizer(
+    image,
+    input_text,
+    add_special_tokens=False,
+    return_tensors="pt"
+).to("cuda")
+text_streamer = TextStreamer(tokenizer, skip_prompt=True)
+_ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=128,
+                   use_cache=True, temperature=1.5, min_p=0.1)
+```
+### **2. Fine-Tuning on a New Dataset**
+```python
+from datasets import load_dataset
+from unsloth.trainer import UnslothVisionDataCollator
+from trl import SFTTrainer, SFTConfig
+FastVisionModel.for_training(model)  # Enable training mode
+dataset = load_dataset("your_custom_dataset")
+data_collator = UnslothVisionDataCollator(model, tokenizer)
+trainer = SFTTrainer(
+    model=model,
+    tokenizer=tokenizer,
+    data_collator=data_collator,
+    train_dataset=dataset,
+    args=SFTConfig(
+        per_device_train_batch_size=2,
+        gradient_accumulation_steps=4,
+        warmup_steps=5,
+        max_steps=30,
+        learning_rate=2e-4,
+        optim="adamw_8bit",
+        output_dir="outputs"
+    ),
+)
+trainer.train()
+```
+---
+## Deployment
+### **Save Locally**
+```python
+model.save_pretrained("Hnm_Llama3.2_(11B)-Vision_lora_model")
+tokenizer.save_pretrained("Hnm_Llama3.2_(11B)-Vision_lora_model")
+```
+### **Push to Hugging Face**
+```python
+model.push_to_hub("your_huggingface_username/Hnm_Llama3.2_(11B)-Vision_lora_model")
+tokenizer.push_to_hub("your_huggingface_username/Hnm_Llama3.2_(11B)-Vision_lora_model")
+```
+---
+## Notes
+- This model is optimized for vision-language tasks in the medical field but can be adapted for other applications.
+- Uses **LoRA adapters**, meaning you can fine-tune it efficiently with very few GPU resources.
+- Supports **Hugging Face Model Hub** for deployment and sharing.
+---
+## Citation
+If you use this model, please cite:
+```
+@misc{Hnm_Llama3.2_11B_Vision,
+  author = {Haq Nawaz Malik},
+  title = {Fine-tuned Llama 3.2 (11B) Vision Model},
+  year = {2025},
+  url = {https://huggingface.co/your_huggingface_username/Hnm_Llama3.2_(11B)-Vision_lora_model}
+}
+```
+---
+## Contact
+For any questions or support, reach out via:
+- **GitHub**: [your-github-profile](https://github.com/Haq-Nawaz-Malik)
+- **Hugging Face**: [your-huggingface-profile](https://huggingface.co/Omarrran)