Hyperparameters used in the training

#1
by Viewegger - opened

Hello,

may I ask you for how many epochs did you trained the model and what hyperparamters - lr rate etc. were used during the training?

Thank you!

Hello viewegger 👋,

Epochs: 1

Batch size: 8 (with gradient accumulation of 4)

Learning rate: 3e-5

Warmup steps: 2000

Evaluation: every 2000 steps (Consider the total number of samples, so you don't get just one or no evaluation)

hope this helps

Thank you - I just asked few more questions about the Korean model in the other repo!

Greatly appreciate the info!

Viewegger changed discussion status to closed

Sign up or log in to comment