Hyperparameters used in the training
#1
by
Viewegger
- opened
Hello,
may I ask you for how many epochs did you trained the model and what hyperparamters - lr rate etc. were used during the training?
Thank you!
Hello viewegger 👋,
Epochs: 1
Batch size: 8 (with gradient accumulation of 4)
Learning rate: 3e-5
Warmup steps: 2000
Evaluation: every 2000 steps (Consider the total number of samples, so you don't get just one or no evaluation)
hope this helps
Thank you - I just asked few more questions about the Korean model in the other repo!
Greatly appreciate the info!
Viewegger
changed discussion status to
closed