Multi-Model Orchestrator: Parent-Child LLM System
A sophisticated multi-model orchestration system that manages parent-child LLM relationships, specifically integrating the CLIP-GPT2 Image Captioner and Flickr30k Text-to-Image models.
🚀 Features
Parent Orchestrator
- Intelligent Task Routing: Automatically routes tasks to appropriate child models
- Model Management: Handles loading, caching, and lifecycle of child models
- Error Handling: Robust error handling and recovery mechanisms
- Task History: Comprehensive logging and monitoring of all operations
- Async Support: Both synchronous and asynchronous processing modes
Child Models
- CLIP-GPT2 Image Captioner: Converts images to descriptive text captions
- Flickr30k Text-to-Image: Generates images from text descriptions
- Extensible Architecture: Easy to add new child models
Advanced Capabilities
- Multimodal Processing: Combines multiple child models for complex tasks
- Batch Processing: Handle multiple tasks efficiently
- Performance Monitoring: Track processing times and success rates
- Memory Management: Efficient GPU/CPU memory usage
📁 Project Structure
├── multi_model_orchestrator.py # Advanced orchestrator with full features
├── simple_orchestrator.py # Simplified interface matching original code
├── multi_model_example.py # Comprehensive examples and demonstrations
├── multi_model_requirements.txt # Dependencies for multi-model system
└── MULTI_MODEL_README.md # This file
🛠️ Installation
- Install dependencies:
pip install -r multi_model_requirements.txt
- Verify installation:
import torch
from transformers import CLIPProcessor
from diffusers import StableDiffusionPipeline
print("All dependencies installed successfully!")
🎯 Quick Start
Basic Usage (Matching Original Code)
from simple_orchestrator import SimpleMultiModelOrchestrator
# Initialize orchestrator
orchestrator = SimpleMultiModelOrchestrator()
orchestrator.initialize_models()
# Generate caption from image
caption = orchestrator.generate_caption("sample_image.jpg")
print(f"Caption: {caption}")
# Generate image from text
image_path = orchestrator.generate_image("A beautiful sunset over mountains")
print(f"Generated image: {image_path}")
# Route tasks
caption = orchestrator.route_task("caption", "sample_image.jpg")
image_path = orchestrator.route_task("generate_image", "A cat on a windowsill")
Advanced Usage
from multi_model_orchestrator import MultiModelOrchestrator
import asyncio
async def main():
# Initialize advanced orchestrator
orchestrator = MultiModelOrchestrator()
await orchestrator.initialize()
# Multimodal processing
results = await orchestrator.process_multimodal(
image_path="sample_image.jpg",
text_prompt="A serene landscape with mountains"
)
print("Results:", results)
asyncio.run(main())
🔧 Model Integration
Child Model 1: CLIP-GPT2 Image Captioner
- Model:
kunaliitkgp09/clip-gpt2-image-captioner
- Task: Image-to-text captioning
- Input: Image file path
- Output: Descriptive text caption
- Performance: ~40% accuracy on test samples
Child Model 2: Flickr30k Text-to-Image
- Model:
kunaliitkgp09/flickr30k-text-to-image
- Task: Text-to-image generation
- Input: Text prompt
- Output: Generated image file
- Performance: Fine-tuned on Flickr30k dataset
📊 Usage Examples
1. Image Captioning
# Generate caption from image
caption = orchestrator.generate_caption("path/to/image.jpg")
print(f"Generated Caption: {caption}")
2. Text-to-Image Generation
# Generate image from text
image_path = orchestrator.generate_image("A majestic eagle soaring over mountains")
print(f"Generated Image: {image_path}")
3. Multimodal Processing
# Process both image and text together
results = orchestrator.process_multimodal_task(
image_path="sample_image.jpg",
text_prompt="A serene landscape with mountains"
)
print("Caption:", results["caption"])
print("Generated Image:", results["generated_image"])
print("Analysis:", results["analysis_prompt"])
4. Async Processing
# Async version for better performance
async def async_example():
results = await orchestrator.process_multimodal_async(
image_path="sample_image.jpg",
text_prompt="A futuristic cityscape"
)
return results
5. Batch Processing
# Process multiple tasks
image_tasks = [
"A beautiful sunset",
"A cozy coffee shop",
"A vibrant garden"
]
for prompt in image_tasks:
image_path = orchestrator.generate_image(prompt)
print(f"Generated: {image_path}")
🔍 Task History and Monitoring
# Get orchestrator status
status = orchestrator.get_status()
print(f"Status: {status}")
# Get task history
history = orchestrator.get_task_history()
for task in history:
print(f"Task: {task['task_type']}, Time: {task['processing_time']:.2f}s")
# Save task history
orchestrator.save_task_history("my_tasks.json")
⚙️ Configuration Options
Model Configuration
# Custom model parameters
orchestrator = SimpleMultiModelOrchestrator(device="cuda") # or "cpu"
# Custom generation parameters
image_path = orchestrator.generate_image(
"A beautiful landscape",
output_path="custom_output.png"
)
Async Configuration
# Async orchestrator with concurrent processing
async_orchestrator = AsyncMultiModelOrchestrator()
# Process tasks concurrently
results = await async_orchestrator.process_multimodal_async(
image_path="image.jpg",
text_prompt="prompt"
)
🎯 Use Cases
1. Content Creation
- Generate captions for social media images
- Create images from text descriptions
- Multimodal content analysis
2. Research and Development
- Model performance comparison
- Multimodal AI research
- Prototype development
3. Production Systems
- Automated content generation
- Image analysis pipelines
- Text-to-image applications
4. Educational Applications
- AI model demonstration
- Multimodal learning systems
- Research toolkits
🔧 Advanced Features
Error Handling
try:
caption = orchestrator.generate_caption("image.jpg")
except Exception as e:
print(f"Error: {e}")
# Handle error gracefully
Performance Optimization
# Use async for better performance
async def optimized_processing():
tasks = [
orchestrator.generate_caption_async("image1.jpg"),
orchestrator.generate_caption_async("image2.jpg"),
orchestrator.generate_image_async("prompt1"),
orchestrator.generate_image_async("prompt2")
]
results = await asyncio.gather(*tasks)
return results
Custom Model Integration
# Add new child models
class CustomChildModel:
def __init__(self, model_name):
self.model = load_model(model_name)
def process(self, input_data):
# Custom processing logic
return result
# Integrate with orchestrator
orchestrator.add_child_model("custom_model", CustomChildModel("model_name"))
📈 Performance Metrics
The orchestrator tracks various performance metrics:
- Processing Time: Time taken for each task
- Success Rate: Percentage of successful operations
- Memory Usage: GPU/CPU memory consumption
- Model Load Times: Time to initialize each child model
- Task Throughput: Number of tasks processed per second
🚨 Important Notes
System Requirements
- GPU: Recommended for optimal performance (CUDA compatible)
- RAM: 8GB+ for smooth operation
- Storage: 5GB+ for model downloads and generated content
- Python: 3.8+ required
Model Downloads
- Models are downloaded automatically on first use
- CLIP-GPT2: ~500MB
- Stable Diffusion: ~4GB
- Total initial download: ~5GB
Memory Management
- Models are loaded into GPU memory when available
- CPU fallback available for systems without GPU
- Memory usage scales with batch size and model complexity
🤝 Contributing
Contributions are welcome! Please feel free to submit pull requests or open issues for:
- New child model integrations
- Performance improvements
- Bug fixes
- Documentation enhancements
- Feature requests
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- CLIP-GPT2 Model: kunaliitkgp09/clip-gpt2-image-captioner
- Stable Diffusion Model: kunaliitkgp09/flickr30k-text-to-image
- Hugging Face: For providing the model hosting platform
- PyTorch: For the deep learning framework
- Transformers: For the model loading and processing utilities
📚 References
- CLIP: "Learning Transferable Visual Representations" (Radford et al., 2021)
- GPT-2: "Language Models are Unsupervised Multitask Learners" (Radford et al., 2019)
- Stable Diffusion: "High-Resolution Image Synthesis with Latent Diffusion Models" (Rombach et al., 2022)
- Flickr30k: "From Image Descriptions to Visual Denotations" (Young et al., 2014)
🔗 Links
- CLIP-GPT2 Model: https://huggingface.co/kunaliitkgp09/clip-gpt2-image-captioner
- Flickr30k Text-to-Image: https://huggingface.co/kunaliitkgp09/flickr30k-text-to-image
- Hugging Face Hub: https://huggingface.co/
- PyTorch: https://pytorch.org/
- Transformers: https://huggingface.co/docs/transformers/
Happy Orchestrating! 🚀