Datasets, code, and models for online RLHF (i.e., iterative DPO)
Totally Free + Zero Barriers + No Login Required