General Model Enhancement via ICM-DPO with Comprehensive LoRA
🚀 Overview
This model demonstrates comprehensive capability enhancement using ICM-generated preferences and high-capacity LoRA training via Direct Preference Optimization (DPO). This is Recipe #6 from the Ellora project - a collection of standardized recipes for enhancing LLM capabilities.
🔧 Key Features
- 🎯 Comprehensive LoRA: Targets all major linear layers with rank 32 for maximum capacity enhancement
- 📊 ICM-Generated Preferences: Uses Internal Coherence Maximization for completely label-free preference data generation
- ⚡ DPO Training: Direct preference optimization without requiring a separate reward model
- 🌐 General Purpose: Enhances capabilities across diverse tasks (reasoning, coding, creative writing, etc.)
- 💾 Memory Efficient: Uses gradient checkpointing and 8-bit optimizer for efficient training
📊 Model Configuration
- Base Model:
google/gemma-3-270m-it
- LoRA Rank: 32
- LoRA Alpha: 64
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Training Method: Direct Preference Optimization (DPO)
- Beta (KL Penalty): 0.5
- Trainable Parameters: ~56.13838755173775% of base model
📈 Training Details
Dataset
- Source: codelion/gemma-3-270m-icm-dpo
- Method: ICM (Internal Coherence Maximization) for label-free preference generation
- Training Samples: 46044
- Evaluation Samples: 50
Training Configuration
- Epochs: 3
- Batch Size: 2 (per device)
- Gradient Accumulation: 8 steps
- Effective Batch Size: 16
- Learning Rate: 5e-06
- Optimizer: paged_adamw_8bit
- Memory Optimization: BF16, Gradient Checkpointing
🔧 Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
"google/gemma-3-270m-it",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-270m-it")
# Load the enhanced model
model = PeftModel.from_pretrained(base_model, "codelion/gemma-3-270m-icm-dpo-lora")
# Generate enhanced responses
prompt = "Explain quantum computing in simple terms."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=150, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
🎯 Capabilities Enhanced
This model shows improvements across multiple domains:
- 🧠 Reasoning: Logical thinking, mathematical problem solving
- ✍️ Creative Writing: Story generation, poetry, descriptive text
- 💻 Code Generation: Python, JavaScript, SQL code creation
- ❓ Question Answering: Factual responses, explanations
- 🔧 Problem Solving: Step-by-step solutions, systematic thinking
- 📋 Instruction Following: Adherence to specific formatting and requirements
🔬 Methodology: ICM + DPO
ICM (Internal Coherence Maximization)
ICM generates preference pairs without human annotation by:
- Creating diverse prompts across multiple domains
- Generating multiple responses per prompt
- Using systematic evaluation to rank responses
- Creating (prompt, chosen, rejected) preference pairs
DPO (Direct Preference Optimization)
DPO directly optimizes the model to:
- Increase probability of chosen responses
- Decrease probability of rejected responses
- Maintain similarity to reference model (KL constraint)
- Learn preferences without reward model training
📊 Expected Benefits
- ✅ Enhanced Quality: Better responses across all task types
- ✅ Label-Free Training: No manual preference annotation required
- ✅ Comprehensive Coverage: All major model components enhanced
- ✅ Memory Efficient: ~56.13838755173775% trainable parameters vs full fine-tuning
- ✅ Reproducible: Standardized recipe from Ellora project
🏷️ Related Resources
- 📚 Ellora Project: github.com/codelion/ellora
- 🔄 ICM Repository: github.com/codelion/icm
- 📊 Training Dataset: codelion/gemma-3-270m-icm-dpo
- 🤖 Base Model: google/gemma-3-270m-it
- 📄 DPO Paper: Direct Preference Optimization
💡 Innovation Summary
This recipe demonstrates how to enhance model capabilities comprehensively without any manual labeling:
- 🎯 ICM generates diverse, high-quality preference pairs automatically
- ⚡ DPO optimizes preferences directly without reward model complexity
- 🔧 Comprehensive LoRA maximizes enhancement while maintaining efficiency
- 🌐 Multi-domain training improves general capabilities, not just specific tasks
This adapter is part of the Ellora project - standardized recipes for enhancing LLM capabilities. Recipe #6 demonstrates label-free general enhancement via ICM + DPO.
- Downloads last month
- 29
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support