Peng Wang
stillarrow
AI & ML interests
None yet
Recent Activity
liked
a dataset
about 13 hours ago
LLM360/guru-RL-92k
upvoted
an
article
15 days ago
From GRPO to DAPO and GSPO: What, Why, and How
upvoted
an
article
about 1 month ago
Illustrating Reinforcement Learning from Human Feedback (RLHF)
Organizations
None yet