5 48 4

TongZheng PRO

TongZheng1999

https://kidzheng.github.io/

AI & ML interests

Natural Language Processing

Recent Activity

upvoted a paper about 13 hours ago

MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

updated a model 9 days ago

TongZheng1999/Initial-Dual-Reasoning-4B-Iter1-Strong-Init-Filter-step1200

published a model 9 days ago

TongZheng1999/Initial-Dual-Reasoning-4B-Iter1-Strong-Init-Filter-step1200

View all activity

Organizations

upvoted a paper about 13 hours ago

MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

Paper • 2603.17187 • Published 6 days ago • 122

updated a model 9 days ago

TongZheng1999/Initial-Dual-Reasoning-4B-Iter1-Strong-Init-Filter-step1200

4B • Updated 9 days ago • 23

published a model 9 days ago

TongZheng1999/Initial-Dual-Reasoning-4B-Iter1-Strong-Init-Filter-step1200

4B • Updated 9 days ago • 23

updated a model 9 days ago

TongZheng1999/Initial-Dual-Reasoning-4B-Iter1-Strong-Init-Filter-step1000

4B • Updated 9 days ago • 26

published a model 9 days ago

TongZheng1999/Initial-Dual-Reasoning-4B-Iter1-Strong-Init-Filter-step1000

4B • Updated 9 days ago • 26

updated a dataset 10 days ago

TongZheng1999/Bespoke-Stratos-17k-Init-Model-Final-Iter1-Strong-Init-Filtered-Merged

Viewer • Updated 10 days ago • 45.4k • 14

published a dataset 10 days ago

TongZheng1999/Bespoke-Stratos-17k-Init-Model-Final-Iter1-Strong-Init-Filtered-Merged

Viewer • Updated 10 days ago • 45.4k • 14

updated a model 16 days ago

TongZheng1999/HS_Reasoning_4B_Filter_1_epoch

4B • Updated 16 days ago • 23

published a model 16 days ago

TongZheng1999/HS_Reasoning_4B_Filter_1_epoch

4B • Updated 16 days ago • 23

published 4 datasets 22 days ago

upvoted 3 papers about 1 month ago

PhyCritic: Multimodal Critic Models for Physical AI

Paper • 2602.11124 • Published Feb 11 • 53

OPE: Overcoming Information Saturation in Parallel Thinking via Outline-Guided Path Exploration

Paper • 2602.08344 • Published Feb 9 • 5

SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

Paper • 2602.08234 • Published Feb 9 • 72

upvoted a paper about 2 months ago

Training Data Efficiency in Multimodal Process Reward Models

Paper • 2602.04145 • Published Feb 4 • 78

liked a Space about 2 months ago

Efficient Reasoning Online Judgement

📉

upvoted 2 papers about 2 months ago

CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs

Paper • 2602.03048 • Published Feb 3 • 32

Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation

Paper • 2602.03619 • Published Feb 3 • 27

TongZheng PRO

AI & ML interests

Recent Activity

Organizations

TongZheng1999's activity

Efficient Reasoning Online Judgement