rohitnagareddy's picture
Add model card for fine-tuned AdbhutMOE
b488d79 verified
---
license: mit
language: en
tags:
- mixture-of-experts
- moe
- coding
- code-generation
- fine-tuned
- lora
- instruction
- python
- adbhutmoe
datasets:
- TokenBender/code_instructions_122k_alpaca_style
model_type: mixtral
base_model: rohitnagareddy/AdbhutMOE
---
# AdbhutMOE-Coding-Finetuned - Fine-tuned Coding Assistant
This model is a fine-tuned version of the `rohitnagareddy/AdbhutMOE` Mixture-of-Experts (MoE) model, specialized for Python code generation and programming assistance tasks. It combines the efficiency of sparse MoE architecture with domain-specific fine-tuning for coding applications.
## 💻 Model Description
- **Base Model**: `rohitnagareddy/AdbhutMOE` (Custom MoE Architecture)
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
- **Dataset**: `TokenBender/code_instructions_122k_alpaca_style` - A comprehensive dataset of coding instructions and solutions
- **Architecture**: Mixture-of-Experts with selective expert activation
- **Training**: Optimized for instruction-based code generation with memory-efficient techniques
## 🏗️ Architecture Details
This model is based on a custom Mixture-of-Experts architecture:
- **Experts per Layer**: 8 experts with 2 activated per token
- **Hidden Dimension**: 256
- **Attention Heads**: 4
- **Layers**: 4
- **Vocabulary**: Custom-trained tokenizer (~8K tokens)
- **Max Sequence Length**: 512 tokens
## ⚠️ Important Considerations
- **Verify All Code**: Generated code may contain errors or be suboptimal. Always test and review thoroughly.
- **Security**: Generated code has not been vetted for security vulnerabilities.
- **Educational Model**: This is a proof-of-concept model demonstrating MoE fine-tuning techniques.
- **Limited Training**: Model was trained with limited resources for demonstration purposes.
## 🚀 Usage
### Basic Text Generation
```python
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
model_id = "rohitnagareddy/AdbhutMOE-Coding-Finetuned"
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
# Create a text generation pipeline
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer
)
# Generate code
prompt = '''### Instruction:
Write a Python function that takes a list of integers and returns the sum of all even numbers in the list.
### Response:'''
response = pipe(prompt, max_new_tokens=150, temperature=0.2, do_sample=True)
print(response[0]["generated_text"])
```
### Direct Model Usage
```python
# For more control over generation
prompt = '''### Instruction:
Create a Python class for a simple calculator with basic arithmetic operations.
### Response:'''
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=200,
temperature=0.3,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.pad_token_id
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
```
## 📊 Training Details
### Fine-tuning Configuration
- **Training Steps**: 500 (limited for demonstration)
- **Batch Size**: 1 (with 8 gradient accumulation steps)
- **Learning Rate**: 1e-4
- **Optimizer**: Paged AdamW 8-bit
- **LoRA Rank**: 8
- **LoRA Alpha**: 16
- **Target Modules**: All linear layers including MoE experts and gates
### Base Model Training
- **Pre-training Data**: AG News dataset sample
- **Architecture**: Custom Mixtral-based MoE
- **Training Steps**: 100 (base model pre-training)
## 🎯 Performance Notes
- **Efficiency**: MoE architecture provides parameter efficiency while maintaining performance
- **Memory**: Optimized for memory-efficient inference and training
- **Speed**: Sparse activation patterns enable faster inference compared to dense models of similar capability
## 🔄 Model Lineage
1. **Base Architecture**: Custom Mixtral MoE implementation
2. **Pre-training**: Trained on AG News dataset sample
3. **Fine-tuning**: LoRA adaptation on coding instruction dataset
4. **Optimization**: 4-bit quantization support for efficient deployment
## 📈 Intended Use Cases
- **Code Generation**: Creating Python functions and classes
- **Programming Education**: Demonstrating coding concepts
- **Research**: Studying MoE architectures for domain-specific tasks
- **Prototyping**: Quick code snippet generation
## 🚫 Limitations
- **Limited Scope**: Primarily trained on basic coding tasks
- **Language Focus**: Optimized for Python, limited other language support
- **Scale**: Small model size limits complex reasoning capabilities
- **Training Data**: Limited training iterations due to resource constraints
## 🤝 Contributing
This model serves as a foundation for further experimentation with MoE architectures in code generation. Contributions and improvements are welcome!
---
*Fine-tuned by rohitnagareddy using LoRA on the AdbhutMOE architecture.*
*This model demonstrates the application of parameter-efficient fine-tuning to Mixture-of-Experts models.*