File size: 10,068 Bytes
b4740c6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 |
# Multi-Model Orchestrator: Parent-Child LLM System
A sophisticated multi-model orchestration system that manages parent-child LLM relationships, specifically integrating the [CLIP-GPT2 Image Captioner](https://huggingface.co/kunaliitkgp09/clip-gpt2-image-captioner) and [Flickr30k Text-to-Image](https://huggingface.co/kunaliitkgp09/flickr30k-text-to-image) models.
## 🚀 Features
### **Parent Orchestrator**
- **Intelligent Task Routing**: Automatically routes tasks to appropriate child models
- **Model Management**: Handles loading, caching, and lifecycle of child models
- **Error Handling**: Robust error handling and recovery mechanisms
- **Task History**: Comprehensive logging and monitoring of all operations
- **Async Support**: Both synchronous and asynchronous processing modes
### **Child Models**
- **CLIP-GPT2 Image Captioner**: Converts images to descriptive text captions
- **Flickr30k Text-to-Image**: Generates images from text descriptions
- **Extensible Architecture**: Easy to add new child models
### **Advanced Capabilities**
- **Multimodal Processing**: Combines multiple child models for complex tasks
- **Batch Processing**: Handle multiple tasks efficiently
- **Performance Monitoring**: Track processing times and success rates
- **Memory Management**: Efficient GPU/CPU memory usage
## 📁 Project Structure
```
├── multi_model_orchestrator.py # Advanced orchestrator with full features
├── simple_orchestrator.py # Simplified interface matching original code
├── multi_model_example.py # Comprehensive examples and demonstrations
├── multi_model_requirements.txt # Dependencies for multi-model system
└── MULTI_MODEL_README.md # This file
```
## 🛠️ Installation
1. **Install dependencies:**
```bash
pip install -r multi_model_requirements.txt
```
2. **Verify installation:**
```python
import torch
from transformers import CLIPProcessor
from diffusers import StableDiffusionPipeline
print("All dependencies installed successfully!")
```
## 🎯 Quick Start
### **Basic Usage (Matching Original Code)**
```python
from simple_orchestrator import SimpleMultiModelOrchestrator
# Initialize orchestrator
orchestrator = SimpleMultiModelOrchestrator()
orchestrator.initialize_models()
# Generate caption from image
caption = orchestrator.generate_caption("sample_image.jpg")
print(f"Caption: {caption}")
# Generate image from text
image_path = orchestrator.generate_image("A beautiful sunset over mountains")
print(f"Generated image: {image_path}")
# Route tasks
caption = orchestrator.route_task("caption", "sample_image.jpg")
image_path = orchestrator.route_task("generate_image", "A cat on a windowsill")
```
### **Advanced Usage**
```python
from multi_model_orchestrator import MultiModelOrchestrator
import asyncio
async def main():
# Initialize advanced orchestrator
orchestrator = MultiModelOrchestrator()
await orchestrator.initialize()
# Multimodal processing
results = await orchestrator.process_multimodal(
image_path="sample_image.jpg",
text_prompt="A serene landscape with mountains"
)
print("Results:", results)
asyncio.run(main())
```
## 🔧 Model Integration
### **Child Model 1: CLIP-GPT2 Image Captioner**
- **Model**: `kunaliitkgp09/clip-gpt2-image-captioner`
- **Task**: Image-to-text captioning
- **Input**: Image file path
- **Output**: Descriptive text caption
- **Performance**: ~40% accuracy on test samples
### **Child Model 2: Flickr30k Text-to-Image**
- **Model**: `kunaliitkgp09/flickr30k-text-to-image`
- **Task**: Text-to-image generation
- **Input**: Text prompt
- **Output**: Generated image file
- **Performance**: Fine-tuned on Flickr30k dataset
## 📊 Usage Examples
### **1. Image Captioning**
```python
# Generate caption from image
caption = orchestrator.generate_caption("path/to/image.jpg")
print(f"Generated Caption: {caption}")
```
### **2. Text-to-Image Generation**
```python
# Generate image from text
image_path = orchestrator.generate_image("A majestic eagle soaring over mountains")
print(f"Generated Image: {image_path}")
```
### **3. Multimodal Processing**
```python
# Process both image and text together
results = orchestrator.process_multimodal_task(
image_path="sample_image.jpg",
text_prompt="A serene landscape with mountains"
)
print("Caption:", results["caption"])
print("Generated Image:", results["generated_image"])
print("Analysis:", results["analysis_prompt"])
```
### **4. Async Processing**
```python
# Async version for better performance
async def async_example():
results = await orchestrator.process_multimodal_async(
image_path="sample_image.jpg",
text_prompt="A futuristic cityscape"
)
return results
```
### **5. Batch Processing**
```python
# Process multiple tasks
image_tasks = [
"A beautiful sunset",
"A cozy coffee shop",
"A vibrant garden"
]
for prompt in image_tasks:
image_path = orchestrator.generate_image(prompt)
print(f"Generated: {image_path}")
```
## 🔍 Task History and Monitoring
```python
# Get orchestrator status
status = orchestrator.get_status()
print(f"Status: {status}")
# Get task history
history = orchestrator.get_task_history()
for task in history:
print(f"Task: {task['task_type']}, Time: {task['processing_time']:.2f}s")
# Save task history
orchestrator.save_task_history("my_tasks.json")
```
## ⚙️ Configuration Options
### **Model Configuration**
```python
# Custom model parameters
orchestrator = SimpleMultiModelOrchestrator(device="cuda") # or "cpu"
# Custom generation parameters
image_path = orchestrator.generate_image(
"A beautiful landscape",
output_path="custom_output.png"
)
```
### **Async Configuration**
```python
# Async orchestrator with concurrent processing
async_orchestrator = AsyncMultiModelOrchestrator()
# Process tasks concurrently
results = await async_orchestrator.process_multimodal_async(
image_path="image.jpg",
text_prompt="prompt"
)
```
## 🎯 Use Cases
### **1. Content Creation**
- Generate captions for social media images
- Create images from text descriptions
- Multimodal content analysis
### **2. Research and Development**
- Model performance comparison
- Multimodal AI research
- Prototype development
### **3. Production Systems**
- Automated content generation
- Image analysis pipelines
- Text-to-image applications
### **4. Educational Applications**
- AI model demonstration
- Multimodal learning systems
- Research toolkits
## 🔧 Advanced Features
### **Error Handling**
```python
try:
caption = orchestrator.generate_caption("image.jpg")
except Exception as e:
print(f"Error: {e}")
# Handle error gracefully
```
### **Performance Optimization**
```python
# Use async for better performance
async def optimized_processing():
tasks = [
orchestrator.generate_caption_async("image1.jpg"),
orchestrator.generate_caption_async("image2.jpg"),
orchestrator.generate_image_async("prompt1"),
orchestrator.generate_image_async("prompt2")
]
results = await asyncio.gather(*tasks)
return results
```
### **Custom Model Integration**
```python
# Add new child models
class CustomChildModel:
def __init__(self, model_name):
self.model = load_model(model_name)
def process(self, input_data):
# Custom processing logic
return result
# Integrate with orchestrator
orchestrator.add_child_model("custom_model", CustomChildModel("model_name"))
```
## 📈 Performance Metrics
The orchestrator tracks various performance metrics:
- **Processing Time**: Time taken for each task
- **Success Rate**: Percentage of successful operations
- **Memory Usage**: GPU/CPU memory consumption
- **Model Load Times**: Time to initialize each child model
- **Task Throughput**: Number of tasks processed per second
## 🚨 Important Notes
### **System Requirements**
- **GPU**: Recommended for optimal performance (CUDA compatible)
- **RAM**: 8GB+ for smooth operation
- **Storage**: 5GB+ for model downloads and generated content
- **Python**: 3.8+ required
### **Model Downloads**
- Models are downloaded automatically on first use
- CLIP-GPT2: ~500MB
- Stable Diffusion: ~4GB
- Total initial download: ~5GB
### **Memory Management**
- Models are loaded into GPU memory when available
- CPU fallback available for systems without GPU
- Memory usage scales with batch size and model complexity
## 🤝 Contributing
Contributions are welcome! Please feel free to submit pull requests or open issues for:
- New child model integrations
- Performance improvements
- Bug fixes
- Documentation enhancements
- Feature requests
## 📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
## 🙏 Acknowledgments
- **CLIP-GPT2 Model**: [kunaliitkgp09/clip-gpt2-image-captioner](https://huggingface.co/kunaliitkgp09/clip-gpt2-image-captioner)
- **Stable Diffusion Model**: [kunaliitkgp09/flickr30k-text-to-image](https://huggingface.co/kunaliitkgp09/flickr30k-text-to-image)
- **Hugging Face**: For providing the model hosting platform
- **PyTorch**: For the deep learning framework
- **Transformers**: For the model loading and processing utilities
## 📚 References
1. **CLIP**: "Learning Transferable Visual Representations" (Radford et al., 2021)
2. **GPT-2**: "Language Models are Unsupervised Multitask Learners" (Radford et al., 2019)
3. **Stable Diffusion**: "High-Resolution Image Synthesis with Latent Diffusion Models" (Rombach et al., 2022)
4. **Flickr30k**: "From Image Descriptions to Visual Denotations" (Young et al., 2014)
## 🔗 Links
- **CLIP-GPT2 Model**: https://huggingface.co/kunaliitkgp09/clip-gpt2-image-captioner
- **Flickr30k Text-to-Image**: https://huggingface.co/kunaliitkgp09/flickr30k-text-to-image
- **Hugging Face Hub**: https://huggingface.co/
- **PyTorch**: https://pytorch.org/
- **Transformers**: https://huggingface.co/docs/transformers/
---
**Happy Orchestrating! 🚀** |