|
# Multi-Model Orchestrator: Parent-Child LLM System |
|
|
|
A sophisticated multi-model orchestration system that manages parent-child LLM relationships, specifically integrating the [CLIP-GPT2 Image Captioner](https://huggingface.co/kunaliitkgp09/clip-gpt2-image-captioner) and [Flickr30k Text-to-Image](https://huggingface.co/kunaliitkgp09/flickr30k-text-to-image) models. |
|
|
|
## 🚀 Features |
|
|
|
### **Parent Orchestrator** |
|
- **Intelligent Task Routing**: Automatically routes tasks to appropriate child models |
|
- **Model Management**: Handles loading, caching, and lifecycle of child models |
|
- **Error Handling**: Robust error handling and recovery mechanisms |
|
- **Task History**: Comprehensive logging and monitoring of all operations |
|
- **Async Support**: Both synchronous and asynchronous processing modes |
|
|
|
### **Child Models** |
|
- **CLIP-GPT2 Image Captioner**: Converts images to descriptive text captions |
|
- **Flickr30k Text-to-Image**: Generates images from text descriptions |
|
- **Extensible Architecture**: Easy to add new child models |
|
|
|
### **Advanced Capabilities** |
|
- **Multimodal Processing**: Combines multiple child models for complex tasks |
|
- **Batch Processing**: Handle multiple tasks efficiently |
|
- **Performance Monitoring**: Track processing times and success rates |
|
- **Memory Management**: Efficient GPU/CPU memory usage |
|
|
|
## 📁 Project Structure |
|
|
|
``` |
|
├── multi_model_orchestrator.py # Advanced orchestrator with full features |
|
├── simple_orchestrator.py # Simplified interface matching original code |
|
├── multi_model_example.py # Comprehensive examples and demonstrations |
|
├── multi_model_requirements.txt # Dependencies for multi-model system |
|
└── MULTI_MODEL_README.md # This file |
|
``` |
|
|
|
## 🛠️ Installation |
|
|
|
1. **Install dependencies:** |
|
```bash |
|
pip install -r multi_model_requirements.txt |
|
``` |
|
|
|
2. **Verify installation:** |
|
```python |
|
import torch |
|
from transformers import CLIPProcessor |
|
from diffusers import StableDiffusionPipeline |
|
print("All dependencies installed successfully!") |
|
``` |
|
|
|
## 🎯 Quick Start |
|
|
|
### **Basic Usage (Matching Original Code)** |
|
|
|
```python |
|
from simple_orchestrator import SimpleMultiModelOrchestrator |
|
|
|
# Initialize orchestrator |
|
orchestrator = SimpleMultiModelOrchestrator() |
|
orchestrator.initialize_models() |
|
|
|
# Generate caption from image |
|
caption = orchestrator.generate_caption("sample_image.jpg") |
|
print(f"Caption: {caption}") |
|
|
|
# Generate image from text |
|
image_path = orchestrator.generate_image("A beautiful sunset over mountains") |
|
print(f"Generated image: {image_path}") |
|
|
|
# Route tasks |
|
caption = orchestrator.route_task("caption", "sample_image.jpg") |
|
image_path = orchestrator.route_task("generate_image", "A cat on a windowsill") |
|
``` |
|
|
|
### **Advanced Usage** |
|
|
|
```python |
|
from multi_model_orchestrator import MultiModelOrchestrator |
|
import asyncio |
|
|
|
async def main(): |
|
# Initialize advanced orchestrator |
|
orchestrator = MultiModelOrchestrator() |
|
await orchestrator.initialize() |
|
|
|
# Multimodal processing |
|
results = await orchestrator.process_multimodal( |
|
image_path="sample_image.jpg", |
|
text_prompt="A serene landscape with mountains" |
|
) |
|
|
|
print("Results:", results) |
|
|
|
asyncio.run(main()) |
|
``` |
|
|
|
## 🔧 Model Integration |
|
|
|
### **Child Model 1: CLIP-GPT2 Image Captioner** |
|
- **Model**: `kunaliitkgp09/clip-gpt2-image-captioner` |
|
- **Task**: Image-to-text captioning |
|
- **Input**: Image file path |
|
- **Output**: Descriptive text caption |
|
- **Performance**: ~40% accuracy on test samples |
|
|
|
### **Child Model 2: Flickr30k Text-to-Image** |
|
- **Model**: `kunaliitkgp09/flickr30k-text-to-image` |
|
- **Task**: Text-to-image generation |
|
- **Input**: Text prompt |
|
- **Output**: Generated image file |
|
- **Performance**: Fine-tuned on Flickr30k dataset |
|
|
|
## 📊 Usage Examples |
|
|
|
### **1. Image Captioning** |
|
```python |
|
# Generate caption from image |
|
caption = orchestrator.generate_caption("path/to/image.jpg") |
|
print(f"Generated Caption: {caption}") |
|
``` |
|
|
|
### **2. Text-to-Image Generation** |
|
```python |
|
# Generate image from text |
|
image_path = orchestrator.generate_image("A majestic eagle soaring over mountains") |
|
print(f"Generated Image: {image_path}") |
|
``` |
|
|
|
### **3. Multimodal Processing** |
|
```python |
|
# Process both image and text together |
|
results = orchestrator.process_multimodal_task( |
|
image_path="sample_image.jpg", |
|
text_prompt="A serene landscape with mountains" |
|
) |
|
|
|
print("Caption:", results["caption"]) |
|
print("Generated Image:", results["generated_image"]) |
|
print("Analysis:", results["analysis_prompt"]) |
|
``` |
|
|
|
### **4. Async Processing** |
|
```python |
|
# Async version for better performance |
|
async def async_example(): |
|
results = await orchestrator.process_multimodal_async( |
|
image_path="sample_image.jpg", |
|
text_prompt="A futuristic cityscape" |
|
) |
|
return results |
|
``` |
|
|
|
### **5. Batch Processing** |
|
```python |
|
# Process multiple tasks |
|
image_tasks = [ |
|
"A beautiful sunset", |
|
"A cozy coffee shop", |
|
"A vibrant garden" |
|
] |
|
|
|
for prompt in image_tasks: |
|
image_path = orchestrator.generate_image(prompt) |
|
print(f"Generated: {image_path}") |
|
``` |
|
|
|
## 🔍 Task History and Monitoring |
|
|
|
```python |
|
# Get orchestrator status |
|
status = orchestrator.get_status() |
|
print(f"Status: {status}") |
|
|
|
# Get task history |
|
history = orchestrator.get_task_history() |
|
for task in history: |
|
print(f"Task: {task['task_type']}, Time: {task['processing_time']:.2f}s") |
|
|
|
# Save task history |
|
orchestrator.save_task_history("my_tasks.json") |
|
``` |
|
|
|
## ⚙️ Configuration Options |
|
|
|
### **Model Configuration** |
|
```python |
|
# Custom model parameters |
|
orchestrator = SimpleMultiModelOrchestrator(device="cuda") # or "cpu" |
|
|
|
# Custom generation parameters |
|
image_path = orchestrator.generate_image( |
|
"A beautiful landscape", |
|
output_path="custom_output.png" |
|
) |
|
``` |
|
|
|
### **Async Configuration** |
|
```python |
|
# Async orchestrator with concurrent processing |
|
async_orchestrator = AsyncMultiModelOrchestrator() |
|
|
|
# Process tasks concurrently |
|
results = await async_orchestrator.process_multimodal_async( |
|
image_path="image.jpg", |
|
text_prompt="prompt" |
|
) |
|
``` |
|
|
|
## 🎯 Use Cases |
|
|
|
### **1. Content Creation** |
|
- Generate captions for social media images |
|
- Create images from text descriptions |
|
- Multimodal content analysis |
|
|
|
### **2. Research and Development** |
|
- Model performance comparison |
|
- Multimodal AI research |
|
- Prototype development |
|
|
|
### **3. Production Systems** |
|
- Automated content generation |
|
- Image analysis pipelines |
|
- Text-to-image applications |
|
|
|
### **4. Educational Applications** |
|
- AI model demonstration |
|
- Multimodal learning systems |
|
- Research toolkits |
|
|
|
## 🔧 Advanced Features |
|
|
|
### **Error Handling** |
|
```python |
|
try: |
|
caption = orchestrator.generate_caption("image.jpg") |
|
except Exception as e: |
|
print(f"Error: {e}") |
|
# Handle error gracefully |
|
``` |
|
|
|
### **Performance Optimization** |
|
```python |
|
# Use async for better performance |
|
async def optimized_processing(): |
|
tasks = [ |
|
orchestrator.generate_caption_async("image1.jpg"), |
|
orchestrator.generate_caption_async("image2.jpg"), |
|
orchestrator.generate_image_async("prompt1"), |
|
orchestrator.generate_image_async("prompt2") |
|
] |
|
|
|
results = await asyncio.gather(*tasks) |
|
return results |
|
``` |
|
|
|
### **Custom Model Integration** |
|
```python |
|
# Add new child models |
|
class CustomChildModel: |
|
def __init__(self, model_name): |
|
self.model = load_model(model_name) |
|
|
|
def process(self, input_data): |
|
# Custom processing logic |
|
return result |
|
|
|
# Integrate with orchestrator |
|
orchestrator.add_child_model("custom_model", CustomChildModel("model_name")) |
|
``` |
|
|
|
## 📈 Performance Metrics |
|
|
|
The orchestrator tracks various performance metrics: |
|
|
|
- **Processing Time**: Time taken for each task |
|
- **Success Rate**: Percentage of successful operations |
|
- **Memory Usage**: GPU/CPU memory consumption |
|
- **Model Load Times**: Time to initialize each child model |
|
- **Task Throughput**: Number of tasks processed per second |
|
|
|
## 🚨 Important Notes |
|
|
|
### **System Requirements** |
|
- **GPU**: Recommended for optimal performance (CUDA compatible) |
|
- **RAM**: 8GB+ for smooth operation |
|
- **Storage**: 5GB+ for model downloads and generated content |
|
- **Python**: 3.8+ required |
|
|
|
### **Model Downloads** |
|
- Models are downloaded automatically on first use |
|
- CLIP-GPT2: ~500MB |
|
- Stable Diffusion: ~4GB |
|
- Total initial download: ~5GB |
|
|
|
### **Memory Management** |
|
- Models are loaded into GPU memory when available |
|
- CPU fallback available for systems without GPU |
|
- Memory usage scales with batch size and model complexity |
|
|
|
## 🤝 Contributing |
|
|
|
Contributions are welcome! Please feel free to submit pull requests or open issues for: |
|
|
|
- New child model integrations |
|
- Performance improvements |
|
- Bug fixes |
|
- Documentation enhancements |
|
- Feature requests |
|
|
|
## 📄 License |
|
|
|
This project is licensed under the MIT License - see the LICENSE file for details. |
|
|
|
## 🙏 Acknowledgments |
|
|
|
- **CLIP-GPT2 Model**: [kunaliitkgp09/clip-gpt2-image-captioner](https://huggingface.co/kunaliitkgp09/clip-gpt2-image-captioner) |
|
- **Stable Diffusion Model**: [kunaliitkgp09/flickr30k-text-to-image](https://huggingface.co/kunaliitkgp09/flickr30k-text-to-image) |
|
- **Hugging Face**: For providing the model hosting platform |
|
- **PyTorch**: For the deep learning framework |
|
- **Transformers**: For the model loading and processing utilities |
|
|
|
## 📚 References |
|
|
|
1. **CLIP**: "Learning Transferable Visual Representations" (Radford et al., 2021) |
|
2. **GPT-2**: "Language Models are Unsupervised Multitask Learners" (Radford et al., 2019) |
|
3. **Stable Diffusion**: "High-Resolution Image Synthesis with Latent Diffusion Models" (Rombach et al., 2022) |
|
4. **Flickr30k**: "From Image Descriptions to Visual Denotations" (Young et al., 2014) |
|
|
|
## 🔗 Links |
|
|
|
- **CLIP-GPT2 Model**: https://huggingface.co/kunaliitkgp09/clip-gpt2-image-captioner |
|
- **Flickr30k Text-to-Image**: https://huggingface.co/kunaliitkgp09/flickr30k-text-to-image |
|
- **Hugging Face Hub**: https://huggingface.co/ |
|
- **PyTorch**: https://pytorch.org/ |
|
- **Transformers**: https://huggingface.co/docs/transformers/ |
|
|
|
--- |
|
|
|
**Happy Orchestrating! 🚀** |