Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models Paper • 2504.20157 • Published Apr 28 • 38
🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19 • 164
Model Merging Collection Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 30 items • Updated Jun 12, 2024 • 246