We train the reward model as the maximum likelihood estimation of the Bradley-Terry model.
Totally Free + Zero Barriers + No Login Required