Safetensors
GGUF
English
chain-of-thought
step-by-step-reasoning
systematic-research-planning
academic-assistant
thesis-planning
dissertation-planning
research-question-formulation
literature-review-planning
methodology-design
experimental-design
hypothesis-generation
research-proposal-helper
cross-disciplinary-research
student-research-assistant
phd-support
research-gap-analysis
literature-analysis
research-summarization
structured-output
systematic-analysis
problem-decomposition
actionable-planning
scientific-research
social-science-research
engineering-research
humanities-research
ai-research-assistant
research-automation
Research-Reasoner-7B-v0.3
Research-Reasoner-7B
Research-Reasoner
conversational
Research-Reasoner-7B-v0.3 Training Documentation | |
=================================================== | |
Model Training Details | |
--------------------- | |
Base Model: Mistral 7B Instruct v0.3 | |
Fine-tuning Method: LoRA (Low-Rank Adaptation) | |
Training Infrastructure: Single NVIDIA A100 PCIe GPU | |
Training Duration: Approximately 3.8 hours | |
Training Dataset: Custom curated dataset for research planning | |
Dataset Specifications | |
--------------------- | |
Total Token Count: 5,840,200 | |
Total Sample Count: 5,750 | |
Average Tokens/Sample: 1,015.69 | |
Dataset Creation: Generated using DeepSeek-V3 API | |
Training Configuration | |
--------------------- | |
LoRA Parameters: | |
- Rank: 32 | |
- Alpha: 64 | |
- Dropout: 0.1 | |
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head | |
Training Hyperparameters: | |
- Learning Rate: 5e-5 | |
- Batch Size: 4 | |
- Gradient Accumulation: 5 | |
- Effective Batch Size: 20 | |
- Max Sequence Length: 2048 | |
- Epochs: 3 | |
- Warmup Ratio: 0.01 | |
- Weight Decay: 0.01 | |
- Max Grad Norm: 1.0 | |
- LR Scheduler: Cosine | |
Hardware & Environment | |
--------------------- | |
GPU: NVIDIA A100 PCIe (40GB) | |
Operating System: Ubuntu | |
CUDA Version: 11.8 | |
PyTorch Version: 2.7.0 | |
Compute Capability: 8.0 | |
Optimization: FP16, Gradient Checkpointing | |
Training Performance | |
--------------------- | |
Training Runtime: 3.87 hours (13,936 seconds) | |
Train Samples/Second: 1.176 | |
Train Steps/Second: 0.059 | |
Training Loss (Final): 0.137 | |
Validation Loss (Final): 0.230 | |
Total Training Steps: 822 |