multi-model-orchestrator / MULTI_MODEL_README.md
kunaliitkgp09's picture
Upload folder using huggingface_hub
b4740c6 verified

Multi-Model Orchestrator: Parent-Child LLM System

A sophisticated multi-model orchestration system that manages parent-child LLM relationships, specifically integrating the CLIP-GPT2 Image Captioner and Flickr30k Text-to-Image models.

🚀 Features

Parent Orchestrator

  • Intelligent Task Routing: Automatically routes tasks to appropriate child models
  • Model Management: Handles loading, caching, and lifecycle of child models
  • Error Handling: Robust error handling and recovery mechanisms
  • Task History: Comprehensive logging and monitoring of all operations
  • Async Support: Both synchronous and asynchronous processing modes

Child Models

  • CLIP-GPT2 Image Captioner: Converts images to descriptive text captions
  • Flickr30k Text-to-Image: Generates images from text descriptions
  • Extensible Architecture: Easy to add new child models

Advanced Capabilities

  • Multimodal Processing: Combines multiple child models for complex tasks
  • Batch Processing: Handle multiple tasks efficiently
  • Performance Monitoring: Track processing times and success rates
  • Memory Management: Efficient GPU/CPU memory usage

📁 Project Structure

├── multi_model_orchestrator.py    # Advanced orchestrator with full features
├── simple_orchestrator.py         # Simplified interface matching original code
├── multi_model_example.py         # Comprehensive examples and demonstrations
├── multi_model_requirements.txt   # Dependencies for multi-model system
└── MULTI_MODEL_README.md          # This file

🛠️ Installation

  1. Install dependencies:
pip install -r multi_model_requirements.txt
  1. Verify installation:
import torch
from transformers import CLIPProcessor
from diffusers import StableDiffusionPipeline
print("All dependencies installed successfully!")

🎯 Quick Start

Basic Usage (Matching Original Code)

from simple_orchestrator import SimpleMultiModelOrchestrator

# Initialize orchestrator
orchestrator = SimpleMultiModelOrchestrator()
orchestrator.initialize_models()

# Generate caption from image
caption = orchestrator.generate_caption("sample_image.jpg")
print(f"Caption: {caption}")

# Generate image from text
image_path = orchestrator.generate_image("A beautiful sunset over mountains")
print(f"Generated image: {image_path}")

# Route tasks
caption = orchestrator.route_task("caption", "sample_image.jpg")
image_path = orchestrator.route_task("generate_image", "A cat on a windowsill")

Advanced Usage

from multi_model_orchestrator import MultiModelOrchestrator
import asyncio

async def main():
    # Initialize advanced orchestrator
    orchestrator = MultiModelOrchestrator()
    await orchestrator.initialize()
    
    # Multimodal processing
    results = await orchestrator.process_multimodal(
        image_path="sample_image.jpg",
        text_prompt="A serene landscape with mountains"
    )
    
    print("Results:", results)

asyncio.run(main())

🔧 Model Integration

Child Model 1: CLIP-GPT2 Image Captioner

  • Model: kunaliitkgp09/clip-gpt2-image-captioner
  • Task: Image-to-text captioning
  • Input: Image file path
  • Output: Descriptive text caption
  • Performance: ~40% accuracy on test samples

Child Model 2: Flickr30k Text-to-Image

  • Model: kunaliitkgp09/flickr30k-text-to-image
  • Task: Text-to-image generation
  • Input: Text prompt
  • Output: Generated image file
  • Performance: Fine-tuned on Flickr30k dataset

📊 Usage Examples

1. Image Captioning

# Generate caption from image
caption = orchestrator.generate_caption("path/to/image.jpg")
print(f"Generated Caption: {caption}")

2. Text-to-Image Generation

# Generate image from text
image_path = orchestrator.generate_image("A majestic eagle soaring over mountains")
print(f"Generated Image: {image_path}")

3. Multimodal Processing

# Process both image and text together
results = orchestrator.process_multimodal_task(
    image_path="sample_image.jpg",
    text_prompt="A serene landscape with mountains"
)

print("Caption:", results["caption"])
print("Generated Image:", results["generated_image"])
print("Analysis:", results["analysis_prompt"])

4. Async Processing

# Async version for better performance
async def async_example():
    results = await orchestrator.process_multimodal_async(
        image_path="sample_image.jpg",
        text_prompt="A futuristic cityscape"
    )
    return results

5. Batch Processing

# Process multiple tasks
image_tasks = [
    "A beautiful sunset",
    "A cozy coffee shop",
    "A vibrant garden"
]

for prompt in image_tasks:
    image_path = orchestrator.generate_image(prompt)
    print(f"Generated: {image_path}")

🔍 Task History and Monitoring

# Get orchestrator status
status = orchestrator.get_status()
print(f"Status: {status}")

# Get task history
history = orchestrator.get_task_history()
for task in history:
    print(f"Task: {task['task_type']}, Time: {task['processing_time']:.2f}s")

# Save task history
orchestrator.save_task_history("my_tasks.json")

⚙️ Configuration Options

Model Configuration

# Custom model parameters
orchestrator = SimpleMultiModelOrchestrator(device="cuda")  # or "cpu"

# Custom generation parameters
image_path = orchestrator.generate_image(
    "A beautiful landscape",
    output_path="custom_output.png"
)

Async Configuration

# Async orchestrator with concurrent processing
async_orchestrator = AsyncMultiModelOrchestrator()

# Process tasks concurrently
results = await async_orchestrator.process_multimodal_async(
    image_path="image.jpg",
    text_prompt="prompt"
)

🎯 Use Cases

1. Content Creation

  • Generate captions for social media images
  • Create images from text descriptions
  • Multimodal content analysis

2. Research and Development

  • Model performance comparison
  • Multimodal AI research
  • Prototype development

3. Production Systems

  • Automated content generation
  • Image analysis pipelines
  • Text-to-image applications

4. Educational Applications

  • AI model demonstration
  • Multimodal learning systems
  • Research toolkits

🔧 Advanced Features

Error Handling

try:
    caption = orchestrator.generate_caption("image.jpg")
except Exception as e:
    print(f"Error: {e}")
    # Handle error gracefully

Performance Optimization

# Use async for better performance
async def optimized_processing():
    tasks = [
        orchestrator.generate_caption_async("image1.jpg"),
        orchestrator.generate_caption_async("image2.jpg"),
        orchestrator.generate_image_async("prompt1"),
        orchestrator.generate_image_async("prompt2")
    ]
    
    results = await asyncio.gather(*tasks)
    return results

Custom Model Integration

# Add new child models
class CustomChildModel:
    def __init__(self, model_name):
        self.model = load_model(model_name)
    
    def process(self, input_data):
        # Custom processing logic
        return result

# Integrate with orchestrator
orchestrator.add_child_model("custom_model", CustomChildModel("model_name"))

📈 Performance Metrics

The orchestrator tracks various performance metrics:

  • Processing Time: Time taken for each task
  • Success Rate: Percentage of successful operations
  • Memory Usage: GPU/CPU memory consumption
  • Model Load Times: Time to initialize each child model
  • Task Throughput: Number of tasks processed per second

🚨 Important Notes

System Requirements

  • GPU: Recommended for optimal performance (CUDA compatible)
  • RAM: 8GB+ for smooth operation
  • Storage: 5GB+ for model downloads and generated content
  • Python: 3.8+ required

Model Downloads

  • Models are downloaded automatically on first use
  • CLIP-GPT2: ~500MB
  • Stable Diffusion: ~4GB
  • Total initial download: ~5GB

Memory Management

  • Models are loaded into GPU memory when available
  • CPU fallback available for systems without GPU
  • Memory usage scales with batch size and model complexity

🤝 Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues for:

  • New child model integrations
  • Performance improvements
  • Bug fixes
  • Documentation enhancements
  • Feature requests

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

📚 References

  1. CLIP: "Learning Transferable Visual Representations" (Radford et al., 2021)
  2. GPT-2: "Language Models are Unsupervised Multitask Learners" (Radford et al., 2019)
  3. Stable Diffusion: "High-Resolution Image Synthesis with Latent Diffusion Models" (Rombach et al., 2022)
  4. Flickr30k: "From Image Descriptions to Visual Denotations" (Young et al., 2014)

🔗 Links


Happy Orchestrating! 🚀