MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild Paper • 2603.17187 • Published 6 days ago • 122
TongZheng1999/Initial-Dual-Reasoning-4B-Iter1-Strong-Init-Filter-step1200 4B • Updated 9 days ago • 23
TongZheng1999/Initial-Dual-Reasoning-4B-Iter1-Strong-Init-Filter-step1200 4B • Updated 9 days ago • 23
TongZheng1999/Initial-Dual-Reasoning-4B-Iter1-Strong-Init-Filter-step1000 4B • Updated 9 days ago • 26
TongZheng1999/Initial-Dual-Reasoning-4B-Iter1-Strong-Init-Filter-step1000 4B • Updated 9 days ago • 26
TongZheng1999/Bespoke-Stratos-17k-Init-Model-Final-Iter1-Strong-Init-Filtered-Merged Viewer • Updated 10 days ago • 45.4k • 14
TongZheng1999/Bespoke-Stratos-17k-Init-Model-Final-Iter1-Strong-Init-Filtered-Merged Viewer • Updated 10 days ago • 45.4k • 14
OPE: Overcoming Information Saturation in Parallel Thinking via Outline-Guided Path Exploration Paper • 2602.08344 • Published Feb 9 • 5
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning Paper • 2602.08234 • Published Feb 9 • 72
Training Data Efficiency in Multimodal Process Reward Models Paper • 2602.04145 • Published Feb 4 • 78
CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs Paper • 2602.03048 • Published Feb 3 • 32
Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation Paper • 2602.03619 • Published Feb 3 • 27