rohitnagareddy
/

AdbhutMOE-Coding-Finetuned

+---
+license: mit
+language: en
+tags:
+- mixture-of-experts
+- moe
+- coding
+- code-generation
+- fine-tuned
+- lora
+- instruction
+- python
+- adbhutmoe
+datasets:
+- TokenBender/code_instructions_122k_alpaca_style
+model_type: mixtral
+base_model: rohitnagareddy/AdbhutMOE
+---
+# AdbhutMOE-Coding-Finetuned - Fine-tuned Coding Assistant
+This model is a fine-tuned version of the `rohitnagareddy/AdbhutMOE` Mixture-of-Experts (MoE) model, specialized for Python code generation and programming assistance tasks. It combines the efficiency of sparse MoE architecture with domain-specific fine-tuning for coding applications.
+## 💻 Model Description
+- **Base Model**: `rohitnagareddy/AdbhutMOE` (Custom MoE Architecture)
+- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
+- **Dataset**: `TokenBender/code_instructions_122k_alpaca_style` - A comprehensive dataset of coding instructions and solutions
+- **Architecture**: Mixture-of-Experts with selective expert activation
+- **Training**: Optimized for instruction-based code generation with memory-efficient techniques
+## 🏗️ Architecture Details
+This model is based on a custom Mixture-of-Experts architecture:
+- **Experts per Layer**: 8 experts with 2 activated per token
+- **Hidden Dimension**: 256
+- **Attention Heads**: 4
+- **Layers**: 4
+- **Vocabulary**: Custom-trained tokenizer (~8K tokens)
+- **Max Sequence Length**: 512 tokens
+## ⚠️ Important Considerations
+- **Verify All Code**: Generated code may contain errors or be suboptimal. Always test and review thoroughly.
+- **Security**: Generated code has not been vetted for security vulnerabilities.
+- **Educational Model**: This is a proof-of-concept model demonstrating MoE fine-tuning techniques.
+- **Limited Training**: Model was trained with limited resources for demonstration purposes.
+## 🚀 Usage
+### Basic Text Generation
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
+import torch
+model_id = "rohitnagareddy/AdbhutMOE-Coding-Finetuned"
+# Load model and tokenizer
+tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.float16,
+    device_map="auto",
+    trust_remote_code=True
+)
+# Create a text generation pipeline
+pipe = pipeline(
+    "text-generation",
+    model=model,
+    tokenizer=tokenizer
+)
+# Generate code
+prompt = '''### Instruction:
+Write a Python function that takes a list of integers and returns the sum of all even numbers in the list.
+### Response:'''
+response = pipe(prompt, max_new_tokens=150, temperature=0.2, do_sample=True)
+print(response[0]["generated_text"])
+```
+### Direct Model Usage
+```python
+# For more control over generation
+prompt = '''### Instruction:
+Create a Python class for a simple calculator with basic arithmetic operations.
+### Response:'''
+inputs = tokenizer(prompt, return_tensors="pt")
+with torch.no_grad():
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=200,
+        temperature=0.3,
+        top_p=0.9,
+        do_sample=True,
+        pad_token_id=tokenizer.pad_token_id
+    )
+generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(generated_text)
+```
+## 📊 Training Details
+### Fine-tuning Configuration
+- **Training Steps**: 500 (limited for demonstration)
+- **Batch Size**: 1 (with 8 gradient accumulation steps)
+- **Learning Rate**: 1e-4
+- **Optimizer**: Paged AdamW 8-bit
+- **LoRA Rank**: 8
+- **LoRA Alpha**: 16
+- **Target Modules**: All linear layers including MoE experts and gates
+### Base Model Training
+- **Pre-training Data**: AG News dataset sample
+- **Architecture**: Custom Mixtral-based MoE
+- **Training Steps**: 100 (base model pre-training)
+## 🎯 Performance Notes
+- **Efficiency**: MoE architecture provides parameter efficiency while maintaining performance
+- **Memory**: Optimized for memory-efficient inference and training
+- **Speed**: Sparse activation patterns enable faster inference compared to dense models of similar capability
+## 🔄 Model Lineage
+1. **Base Architecture**: Custom Mixtral MoE implementation
+2. **Pre-training**: Trained on AG News dataset sample
+3. **Fine-tuning**: LoRA adaptation on coding instruction dataset
+4. **Optimization**: 4-bit quantization support for efficient deployment
+## 📈 Intended Use Cases
+- **Code Generation**: Creating Python functions and classes
+- **Programming Education**: Demonstrating coding concepts
+- **Research**: Studying MoE architectures for domain-specific tasks
+- **Prototyping**: Quick code snippet generation
+## 🚫 Limitations
+- **Limited Scope**: Primarily trained on basic coding tasks
+- **Language Focus**: Optimized for Python, limited other language support
+- **Scale**: Small model size limits complex reasoning capabilities
+- **Training Data**: Limited training iterations due to resource constraints
+## 🤝 Contributing
+This model serves as a foundation for further experimentation with MoE architectures in code generation. Contributions and improvements are welcome!
+---
+*Fine-tuned by rohitnagareddy using LoRA on the AdbhutMOE architecture.*
+*This model demonstrates the application of parameter-efficient fine-tuning to Mixture-of-Experts models.*