rohitnagareddy commited on
Commit
b488d79
·
verified ·
1 Parent(s): 63baed0

Add model card for fine-tuned AdbhutMOE

Browse files
Files changed (1) hide show
  1. README.md +158 -0
README.md ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language: en
4
+ tags:
5
+ - mixture-of-experts
6
+ - moe
7
+ - coding
8
+ - code-generation
9
+ - fine-tuned
10
+ - lora
11
+ - instruction
12
+ - python
13
+ - adbhutmoe
14
+ datasets:
15
+ - TokenBender/code_instructions_122k_alpaca_style
16
+ model_type: mixtral
17
+ base_model: rohitnagareddy/AdbhutMOE
18
+ ---
19
+
20
+ # AdbhutMOE-Coding-Finetuned - Fine-tuned Coding Assistant
21
+
22
+ This model is a fine-tuned version of the `rohitnagareddy/AdbhutMOE` Mixture-of-Experts (MoE) model, specialized for Python code generation and programming assistance tasks. It combines the efficiency of sparse MoE architecture with domain-specific fine-tuning for coding applications.
23
+
24
+ ## 💻 Model Description
25
+
26
+ - **Base Model**: `rohitnagareddy/AdbhutMOE` (Custom MoE Architecture)
27
+ - **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
28
+ - **Dataset**: `TokenBender/code_instructions_122k_alpaca_style` - A comprehensive dataset of coding instructions and solutions
29
+ - **Architecture**: Mixture-of-Experts with selective expert activation
30
+ - **Training**: Optimized for instruction-based code generation with memory-efficient techniques
31
+
32
+ ## 🏗️ Architecture Details
33
+
34
+ This model is based on a custom Mixture-of-Experts architecture:
35
+ - **Experts per Layer**: 8 experts with 2 activated per token
36
+ - **Hidden Dimension**: 256
37
+ - **Attention Heads**: 4
38
+ - **Layers**: 4
39
+ - **Vocabulary**: Custom-trained tokenizer (~8K tokens)
40
+ - **Max Sequence Length**: 512 tokens
41
+
42
+ ## ⚠️ Important Considerations
43
+
44
+ - **Verify All Code**: Generated code may contain errors or be suboptimal. Always test and review thoroughly.
45
+ - **Security**: Generated code has not been vetted for security vulnerabilities.
46
+ - **Educational Model**: This is a proof-of-concept model demonstrating MoE fine-tuning techniques.
47
+ - **Limited Training**: Model was trained with limited resources for demonstration purposes.
48
+
49
+ ## 🚀 Usage
50
+
51
+ ### Basic Text Generation
52
+
53
+ ```python
54
+ from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
55
+ import torch
56
+
57
+ model_id = "rohitnagareddy/AdbhutMOE-Coding-Finetuned"
58
+
59
+ # Load model and tokenizer
60
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
61
+ model = AutoModelForCausalLM.from_pretrained(
62
+ model_id,
63
+ torch_dtype=torch.float16,
64
+ device_map="auto",
65
+ trust_remote_code=True
66
+ )
67
+
68
+ # Create a text generation pipeline
69
+ pipe = pipeline(
70
+ "text-generation",
71
+ model=model,
72
+ tokenizer=tokenizer
73
+ )
74
+
75
+ # Generate code
76
+ prompt = '''### Instruction:
77
+ Write a Python function that takes a list of integers and returns the sum of all even numbers in the list.
78
+
79
+ ### Response:'''
80
+
81
+ response = pipe(prompt, max_new_tokens=150, temperature=0.2, do_sample=True)
82
+ print(response[0]["generated_text"])
83
+ ```
84
+
85
+ ### Direct Model Usage
86
+
87
+ ```python
88
+ # For more control over generation
89
+ prompt = '''### Instruction:
90
+ Create a Python class for a simple calculator with basic arithmetic operations.
91
+
92
+ ### Response:'''
93
+
94
+ inputs = tokenizer(prompt, return_tensors="pt")
95
+ with torch.no_grad():
96
+ outputs = model.generate(
97
+ **inputs,
98
+ max_new_tokens=200,
99
+ temperature=0.3,
100
+ top_p=0.9,
101
+ do_sample=True,
102
+ pad_token_id=tokenizer.pad_token_id
103
+ )
104
+
105
+ generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
106
+ print(generated_text)
107
+ ```
108
+
109
+ ## 📊 Training Details
110
+
111
+ ### Fine-tuning Configuration
112
+ - **Training Steps**: 500 (limited for demonstration)
113
+ - **Batch Size**: 1 (with 8 gradient accumulation steps)
114
+ - **Learning Rate**: 1e-4
115
+ - **Optimizer**: Paged AdamW 8-bit
116
+ - **LoRA Rank**: 8
117
+ - **LoRA Alpha**: 16
118
+ - **Target Modules**: All linear layers including MoE experts and gates
119
+
120
+ ### Base Model Training
121
+ - **Pre-training Data**: AG News dataset sample
122
+ - **Architecture**: Custom Mixtral-based MoE
123
+ - **Training Steps**: 100 (base model pre-training)
124
+
125
+ ## 🎯 Performance Notes
126
+
127
+ - **Efficiency**: MoE architecture provides parameter efficiency while maintaining performance
128
+ - **Memory**: Optimized for memory-efficient inference and training
129
+ - **Speed**: Sparse activation patterns enable faster inference compared to dense models of similar capability
130
+
131
+ ## 🔄 Model Lineage
132
+
133
+ 1. **Base Architecture**: Custom Mixtral MoE implementation
134
+ 2. **Pre-training**: Trained on AG News dataset sample
135
+ 3. **Fine-tuning**: LoRA adaptation on coding instruction dataset
136
+ 4. **Optimization**: 4-bit quantization support for efficient deployment
137
+
138
+ ## 📈 Intended Use Cases
139
+
140
+ - **Code Generation**: Creating Python functions and classes
141
+ - **Programming Education**: Demonstrating coding concepts
142
+ - **Research**: Studying MoE architectures for domain-specific tasks
143
+ - **Prototyping**: Quick code snippet generation
144
+
145
+ ## 🚫 Limitations
146
+
147
+ - **Limited Scope**: Primarily trained on basic coding tasks
148
+ - **Language Focus**: Optimized for Python, limited other language support
149
+ - **Scale**: Small model size limits complex reasoning capabilities
150
+ - **Training Data**: Limited training iterations due to resource constraints
151
+
152
+ ## 🤝 Contributing
153
+
154
+ This model serves as a foundation for further experimentation with MoE architectures in code generation. Contributions and improvements are welcome!
155
+
156
+ ---
157
+ *Fine-tuned by rohitnagareddy using LoRA on the AdbhutMOE architecture.*
158
+ *This model demonstrates the application of parameter-efficient fine-tuning to Mixture-of-Experts models.*