File size: 1,735 Bytes

bd867ce
 
 
 
 
 
 
 
537c331
bd867ce

MaterialsAnalyst-AI-7B Training Documentation
================================================

Model Training Details
---------------------

Base Model:               Qwen 2.5 Instruct 7B
Fine-tuning Method:       LoRA (Low-Rank Adaptation)
Training Infrastructure:  Single NVIDIA A100 SXM4 GPU
Training Duration:        Approximately 5.4 hours
Training Dataset:         Custom curated dataset for materials analysis

Dataset Specifications
---------------------

Total Token Count:        6,441,671
Total Sample Count:       6,000
Average Tokens/Sample:    1,073.61
Dataset Creation:         Generated using DeepSeekV3 API

Training Configuration
---------------------

LoRA Parameters:
- Rank:                   32
- Alpha:                  64
- Dropout:                0.1
- Target Modules:         q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head

Training Hyperparameters:
- Learning Rate:          5e-5
- Batch Size:             4
- Gradient Accumulation:  5
- Effective Batch Size:   20
- Max Sequence Length:    2048
- Epochs:                 3
- Warmup Ratio:           0.01
- Weight Decay:           0.01
- Max Grad Norm:          1.0
- LR Scheduler:           Cosine

Hardware & Environment
---------------------

GPU:                      NVIDIA A100 SXM4 (40GB)
Operating System:         Ubuntu
CUDA Version:             11.8
PyTorch Version:          2.7.0
Compute Capability:       8.0
Optimization:             FP16, Gradient Checkpointing

Training Performance
---------------------

Training Runtime:         5.37 hours (19,348 seconds)
Train Samples/Second:     0.884
Train Steps/Second:       0.044
Training Loss (Final):    0.170
Validation Loss (Final):  0.136
Total Training Steps:     855