multi-model-orchestrator / MULTI_MODEL_README.md
kunaliitkgp09's picture
Upload folder using huggingface_hub
b4740c6 verified
# Multi-Model Orchestrator: Parent-Child LLM System
A sophisticated multi-model orchestration system that manages parent-child LLM relationships, specifically integrating the [CLIP-GPT2 Image Captioner](https://huggingface.co/kunaliitkgp09/clip-gpt2-image-captioner) and [Flickr30k Text-to-Image](https://huggingface.co/kunaliitkgp09/flickr30k-text-to-image) models.
## 🚀 Features
### **Parent Orchestrator**
- **Intelligent Task Routing**: Automatically routes tasks to appropriate child models
- **Model Management**: Handles loading, caching, and lifecycle of child models
- **Error Handling**: Robust error handling and recovery mechanisms
- **Task History**: Comprehensive logging and monitoring of all operations
- **Async Support**: Both synchronous and asynchronous processing modes
### **Child Models**
- **CLIP-GPT2 Image Captioner**: Converts images to descriptive text captions
- **Flickr30k Text-to-Image**: Generates images from text descriptions
- **Extensible Architecture**: Easy to add new child models
### **Advanced Capabilities**
- **Multimodal Processing**: Combines multiple child models for complex tasks
- **Batch Processing**: Handle multiple tasks efficiently
- **Performance Monitoring**: Track processing times and success rates
- **Memory Management**: Efficient GPU/CPU memory usage
## 📁 Project Structure
```
├── multi_model_orchestrator.py # Advanced orchestrator with full features
├── simple_orchestrator.py # Simplified interface matching original code
├── multi_model_example.py # Comprehensive examples and demonstrations
├── multi_model_requirements.txt # Dependencies for multi-model system
└── MULTI_MODEL_README.md # This file
```
## 🛠️ Installation
1. **Install dependencies:**
```bash
pip install -r multi_model_requirements.txt
```
2. **Verify installation:**
```python
import torch
from transformers import CLIPProcessor
from diffusers import StableDiffusionPipeline
print("All dependencies installed successfully!")
```
## 🎯 Quick Start
### **Basic Usage (Matching Original Code)**
```python
from simple_orchestrator import SimpleMultiModelOrchestrator
# Initialize orchestrator
orchestrator = SimpleMultiModelOrchestrator()
orchestrator.initialize_models()
# Generate caption from image
caption = orchestrator.generate_caption("sample_image.jpg")
print(f"Caption: {caption}")
# Generate image from text
image_path = orchestrator.generate_image("A beautiful sunset over mountains")
print(f"Generated image: {image_path}")
# Route tasks
caption = orchestrator.route_task("caption", "sample_image.jpg")
image_path = orchestrator.route_task("generate_image", "A cat on a windowsill")
```
### **Advanced Usage**
```python
from multi_model_orchestrator import MultiModelOrchestrator
import asyncio
async def main():
# Initialize advanced orchestrator
orchestrator = MultiModelOrchestrator()
await orchestrator.initialize()
# Multimodal processing
results = await orchestrator.process_multimodal(
image_path="sample_image.jpg",
text_prompt="A serene landscape with mountains"
)
print("Results:", results)
asyncio.run(main())
```
## 🔧 Model Integration
### **Child Model 1: CLIP-GPT2 Image Captioner**
- **Model**: `kunaliitkgp09/clip-gpt2-image-captioner`
- **Task**: Image-to-text captioning
- **Input**: Image file path
- **Output**: Descriptive text caption
- **Performance**: ~40% accuracy on test samples
### **Child Model 2: Flickr30k Text-to-Image**
- **Model**: `kunaliitkgp09/flickr30k-text-to-image`
- **Task**: Text-to-image generation
- **Input**: Text prompt
- **Output**: Generated image file
- **Performance**: Fine-tuned on Flickr30k dataset
## 📊 Usage Examples
### **1. Image Captioning**
```python
# Generate caption from image
caption = orchestrator.generate_caption("path/to/image.jpg")
print(f"Generated Caption: {caption}")
```
### **2. Text-to-Image Generation**
```python
# Generate image from text
image_path = orchestrator.generate_image("A majestic eagle soaring over mountains")
print(f"Generated Image: {image_path}")
```
### **3. Multimodal Processing**
```python
# Process both image and text together
results = orchestrator.process_multimodal_task(
image_path="sample_image.jpg",
text_prompt="A serene landscape with mountains"
)
print("Caption:", results["caption"])
print("Generated Image:", results["generated_image"])
print("Analysis:", results["analysis_prompt"])
```
### **4. Async Processing**
```python
# Async version for better performance
async def async_example():
results = await orchestrator.process_multimodal_async(
image_path="sample_image.jpg",
text_prompt="A futuristic cityscape"
)
return results
```
### **5. Batch Processing**
```python
# Process multiple tasks
image_tasks = [
"A beautiful sunset",
"A cozy coffee shop",
"A vibrant garden"
]
for prompt in image_tasks:
image_path = orchestrator.generate_image(prompt)
print(f"Generated: {image_path}")
```
## 🔍 Task History and Monitoring
```python
# Get orchestrator status
status = orchestrator.get_status()
print(f"Status: {status}")
# Get task history
history = orchestrator.get_task_history()
for task in history:
print(f"Task: {task['task_type']}, Time: {task['processing_time']:.2f}s")
# Save task history
orchestrator.save_task_history("my_tasks.json")
```
## ⚙️ Configuration Options
### **Model Configuration**
```python
# Custom model parameters
orchestrator = SimpleMultiModelOrchestrator(device="cuda") # or "cpu"
# Custom generation parameters
image_path = orchestrator.generate_image(
"A beautiful landscape",
output_path="custom_output.png"
)
```
### **Async Configuration**
```python
# Async orchestrator with concurrent processing
async_orchestrator = AsyncMultiModelOrchestrator()
# Process tasks concurrently
results = await async_orchestrator.process_multimodal_async(
image_path="image.jpg",
text_prompt="prompt"
)
```
## 🎯 Use Cases
### **1. Content Creation**
- Generate captions for social media images
- Create images from text descriptions
- Multimodal content analysis
### **2. Research and Development**
- Model performance comparison
- Multimodal AI research
- Prototype development
### **3. Production Systems**
- Automated content generation
- Image analysis pipelines
- Text-to-image applications
### **4. Educational Applications**
- AI model demonstration
- Multimodal learning systems
- Research toolkits
## 🔧 Advanced Features
### **Error Handling**
```python
try:
caption = orchestrator.generate_caption("image.jpg")
except Exception as e:
print(f"Error: {e}")
# Handle error gracefully
```
### **Performance Optimization**
```python
# Use async for better performance
async def optimized_processing():
tasks = [
orchestrator.generate_caption_async("image1.jpg"),
orchestrator.generate_caption_async("image2.jpg"),
orchestrator.generate_image_async("prompt1"),
orchestrator.generate_image_async("prompt2")
]
results = await asyncio.gather(*tasks)
return results
```
### **Custom Model Integration**
```python
# Add new child models
class CustomChildModel:
def __init__(self, model_name):
self.model = load_model(model_name)
def process(self, input_data):
# Custom processing logic
return result
# Integrate with orchestrator
orchestrator.add_child_model("custom_model", CustomChildModel("model_name"))
```
## 📈 Performance Metrics
The orchestrator tracks various performance metrics:
- **Processing Time**: Time taken for each task
- **Success Rate**: Percentage of successful operations
- **Memory Usage**: GPU/CPU memory consumption
- **Model Load Times**: Time to initialize each child model
- **Task Throughput**: Number of tasks processed per second
## 🚨 Important Notes
### **System Requirements**
- **GPU**: Recommended for optimal performance (CUDA compatible)
- **RAM**: 8GB+ for smooth operation
- **Storage**: 5GB+ for model downloads and generated content
- **Python**: 3.8+ required
### **Model Downloads**
- Models are downloaded automatically on first use
- CLIP-GPT2: ~500MB
- Stable Diffusion: ~4GB
- Total initial download: ~5GB
### **Memory Management**
- Models are loaded into GPU memory when available
- CPU fallback available for systems without GPU
- Memory usage scales with batch size and model complexity
## 🤝 Contributing
Contributions are welcome! Please feel free to submit pull requests or open issues for:
- New child model integrations
- Performance improvements
- Bug fixes
- Documentation enhancements
- Feature requests
## 📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
## 🙏 Acknowledgments
- **CLIP-GPT2 Model**: [kunaliitkgp09/clip-gpt2-image-captioner](https://huggingface.co/kunaliitkgp09/clip-gpt2-image-captioner)
- **Stable Diffusion Model**: [kunaliitkgp09/flickr30k-text-to-image](https://huggingface.co/kunaliitkgp09/flickr30k-text-to-image)
- **Hugging Face**: For providing the model hosting platform
- **PyTorch**: For the deep learning framework
- **Transformers**: For the model loading and processing utilities
## 📚 References
1. **CLIP**: "Learning Transferable Visual Representations" (Radford et al., 2021)
2. **GPT-2**: "Language Models are Unsupervised Multitask Learners" (Radford et al., 2019)
3. **Stable Diffusion**: "High-Resolution Image Synthesis with Latent Diffusion Models" (Rombach et al., 2022)
4. **Flickr30k**: "From Image Descriptions to Visual Denotations" (Young et al., 2014)
## 🔗 Links
- **CLIP-GPT2 Model**: https://huggingface.co/kunaliitkgp09/clip-gpt2-image-captioner
- **Flickr30k Text-to-Image**: https://huggingface.co/kunaliitkgp09/flickr30k-text-to-image
- **Hugging Face Hub**: https://huggingface.co/
- **PyTorch**: https://pytorch.org/
- **Transformers**: https://huggingface.co/docs/transformers/
---
**Happy Orchestrating! 🚀**