forge1825
/

FStudent

@@ -1,23 +1,169 @@
-# Merged Phi-3 Model with Distilled Knowledge
-This model is a merged version of the microsoft/Phi-3-mini-4k-instruct base model with LoRA adapters trained through knowledge distillation.
-## Model Details
-- Base Model: microsoft/Phi-3-mini-4k-instruct
-- Adapter Path: ./distilled_model_final
-- Merged On: 2025-05-04 17:57:45
-## Usage
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-# Load the model and tokenizer
-model = AutoModelForCausalLM.from_pretrained("forge1825/FStudent")
-tokenizer = AutoTokenizer.from_pretrained("forge1825/FStudent")
-# Generate text
-input_text = "Hello, I am"
-inputs = tokenizer(input_text, return_tensors="pt")
-outputs = model.generate(**inputs, max_length=50)
-print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-```

+---
+language:
+  - en
+license: mit
+tags:
+  - phi-3
+  - distillation
+  - knowledge-distillation
+  - lora
+  - code-generation
+  - python
+datasets:
+  - Shuu12121/python-codesearch-dataset-open
+model-index:
+  - name: FStudent
+    results:
+      - task:
+          type: text-generation
+          name: Text Generation
+        dataset:
+          type: custom
+          name: Distillation Evaluation
+        metrics:
+          - name: Speedup Factor
+            type: speedup
+            value: 2.5x
+            verified: false
+---
+# FStudent: Distilled Phi-3 Model
+FStudent is a knowledge-distilled version of Microsoft's Phi-3-mini-4k-instruct model, trained through a comprehensive distillation pipeline that combines teacher-student learning with self-study mechanisms.
+## Model Description
+FStudent was created using a multi-stage distillation pipeline that transfers knowledge from a larger teacher model (Phi-4) to the smaller Phi-3-mini-4k-instruct model. The model was trained using LoRA adapters, which were then merged with the base model to create this standalone version.
+### Training Data
+The model was trained on a diverse set of data sources:
+1. **PDF Documents**: Technical documentation and domain-specific knowledge
+2. **Python Code Dataset**: Code examples from the [Shuu12121/python-codesearch-dataset-open](https://huggingface.co/datasets/Shuu12121/python-codesearch-dataset-open) dataset
+3. **Teacher-Generated Examples**: High-quality examples generated by the Phi-4 teacher model
+### Training Process
+The distillation pipeline consisted of six sequential steps:
+1. **Content Extraction & Enrichment**: PDF files were processed to extract and enrich text data
+2. **Teacher Pair Generation**: Training pairs were generated using the Phi-4 teacher model
+3. **Distillation Training**: The student model (Phi-3) was trained using LoRA adapters with the following parameters:
+   - Learning rate: 1e-4
+   - Batch size: 4
+   - Gradient accumulation steps: 8
+   - Mixed precision training
+   - 4-bit quantization during training
+4. **Model Merging**: The trained LoRA adapters were merged with the base Phi-3 model
+5. **Student Self-Study**: The model performed self-directed learning on domain-specific content
+6. **Model Evaluation**: The model was evaluated against the teacher model for performance
+### Model Architecture
+- **Base Model**: microsoft/Phi-3-mini-4k-instruct
+- **Parameter-Efficient Fine-Tuning**: LoRA adapters (merged into this model)
+- **Context Length**: 4K tokens
+- **Architecture**: Transformer-based language model
+## Intended Uses
+This model is designed for:
+- General text generation tasks
+- Python code understanding and generation
+- Technical documentation analysis
+- Question answering on domain-specific topics
+## Performance and Limitations
+### Strengths
+- Faster inference compared to larger models (approximately 2.5x speedup)
+- Maintains much of the capability of the teacher model
+- Enhanced code understanding due to training on Python code datasets
+- Good performance on technical documentation analysis
+### Limitations
+- May not match the full capabilities of larger models on complex reasoning tasks
+- Limited context window compared to some larger models
+- Performance on specialized domains not covered in training data may be reduced
+## Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# Load the model and tokenizer
+model = AutoModelForCausalLM.from_pretrained("forge1825/FStudent")
+tokenizer = AutoTokenizer.from_pretrained("forge1825/FStudent")
+# Generate text
+input_text = "Write a Python function to calculate the Fibonacci sequence:"
+inputs = tokenizer(input_text, return_tensors="pt")
+outputs = model.generate(**inputs, max_length=512)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+### Quantized Usage
+For more efficient inference, you can load the model with quantization:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
+import torch
+# 4-bit quantization configuration
+quantization_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_compute_dtype=torch.float16
+)
+# Load the model with quantization
+model = AutoModelForCausalLM.from_pretrained(
+    "forge1825/FStudent",
+    device_map="auto",
+    quantization_config=quantization_config
+)
+tokenizer = AutoTokenizer.from_pretrained("forge1825/FStudent")
+```
+## Training Details
+- **Training Framework**: Hugging Face Transformers with PEFT
+- **Optimizer**: AdamW
+- **Learning Rate Schedule**: Linear warmup followed by linear decay
+- **Training Hardware**: NVIDIA GPUs
+- **Distillation Method**: Knowledge distillation with teacher-student architecture
+- **Self-Study Mechanism**: Curiosity-driven exploration with hierarchical context
+## Ethical Considerations
+This model inherits the capabilities and limitations of its base model (Phi-3-mini-4k-instruct). While efforts have been made to ensure responsible behavior, the model may still:
+- Generate incorrect or misleading information
+- Produce biased content reflecting biases in the training data
+- Create code that contains bugs or security vulnerabilities
+Users should validate and review the model's outputs, especially for sensitive applications.
+## Citation and Attribution
+If you use this model in your research or applications, please cite:
+```
+@misc{forge1825_fstudent,
+  author = {Forge1825},
+  title = {FStudent: Distilled Phi-3 Model},
+  year = {2025},
+  publisher = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/forge1825/FStudent}}
+}
+```
+## Acknowledgements
+- Microsoft for the Phi-3-mini-4k-instruct base model
+- Hugging Face for the infrastructure and tools
+- The creators of the Python code dataset used in training