AnIdealRing
SmartDazi
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 17 hours ago
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation
upvoted
a
paper
3 days ago
Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models
upvoted
a
paper
11 days ago
RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents