PFPO - a chitanda Collection

chitanda 's Collections

PFPO

updated Feb 6, 2025

Resources for the paper Preference Optimization for Reasoning with Pseudo Feedback (ICLR 2025)

Preference Optimization for Reasoning with Pseudo Feedback

Paper • 2411.16345 • Published Nov 25, 2024 • 1
chitanda/mathscale4o-800k

Viewer • Updated Feb 6, 2025 • 492k • 28 • 1
Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing

Paper • 2402.00658 • Published Feb 1, 2024
chitanda/code-synthetic-test-cases

Preview • Updated Feb 6, 2025 • 54