Research-Reasoner-7B-v0.3 / Training /Training_Documentation.txt
Raymond-dev-546730's picture
Update Training/Training_Documentation.txt
6aded0c verified
raw
history blame
1.75 kB
Research-Reasoner-7B-v0.3 Training Documentation
===================================================
Model Training Details
---------------------
Base Model: Mistral 7B Instruct v0.3
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Training Infrastructure: Single NVIDIA A100 PCIe GPU
Training Duration: Approximately 3.8 hours
Training Dataset: Custom curated dataset for research planning
Dataset Specifications
---------------------
Total Token Count: 5,840,200
Total Sample Count: 5,750
Average Tokens/Sample: 1,015.69
Dataset Creation: Generated using DeepSeek-V3 API
Training Configuration
---------------------
LoRA Parameters:
- Rank: 32
- Alpha: 64
- Dropout: 0.1
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head
Training Hyperparameters:
- Learning Rate: 5e-5
- Batch Size: 4
- Gradient Accumulation: 5
- Effective Batch Size: 20
- Max Sequence Length: 2048
- Epochs: 3
- Warmup Ratio: 0.01
- Weight Decay: 0.01
- Max Grad Norm: 1.0
- LR Scheduler: Cosine
Hardware & Environment
---------------------
GPU: NVIDIA A100 PCIe (40GB)
Operating System: Ubuntu
CUDA Version: 11.8
PyTorch Version: 2.7.0
Compute Capability: 8.0
Optimization: FP16, Gradient Checkpointing
Training Performance
---------------------
Training Runtime: 3.87 hours (13,936 seconds)
Train Samples/Second: 1.176
Train Steps/Second: 0.059
Training Loss (Final): 0.137
Validation Loss (Final): 0.230
Total Training Steps: 822