Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
免费去水印
Log In
Sign Up
gaotang
's Collections
Beyond-Log-Likelihood
RM-R1
Knowledge Conflict
RM-R1
updated
Jun 29, 2025
RM-R1: Reward Modeling as Reasoning
Upvote
9
RM-R1: Reward Modeling as Reasoning
Paper
•
2505.02387
•
Published
May 5, 2025
•
79
gaotang/RM-R1-Entire-RLVR-Train
Viewer
•
Updated
May 20, 2025
•
73k
•
44
•
2
gaotang/RM-R1-Reasoning-RLVR
Viewer
•
Updated
May 20, 2025
•
73k
•
36
•
1
gaotang/RM-R1-Distill-SFT
Viewer
•
Updated
May 20, 2025
•
8.75k
•
51
•
2
gaotang/RM-R1-after-Distill-RLVR
Viewer
•
Updated
May 20, 2025
•
64.2k
•
41
•
1
gaotang/RM-R1-Qwen2.5-Instruct-7B
Text Generation
•
8B
•
Updated
Jun 28, 2025
•
362
•
4
gaotang/RM-R1-Qwen2.5-Instruct-14B
Text Generation
•
15B
•
Updated
Jun 28, 2025
•
69
•
1
gaotang/RM-R1-Qwen2.5-Instruct-32B
Text Generation
•
33B
•
Updated
Jun 28, 2025
•
48
•
1
gaotang/RM-R1-DeepSeek-Distilled-Qwen-7B
Text Generation
•
8B
•
Updated
Jun 28, 2025
•
12
•
2
gaotang/RM-R1-DeepSeek-Distilled-Qwen-14B
Text Generation
•
15B
•
Updated
Jun 28, 2025
•
14
•
1
gaotang/RM-R1-DeepSeek-Distilled-Qwen-32B
Text Generation
•
33B
•
Updated
Jun 28, 2025
•
18
•
•
2
Upvote
9
+5
Share collection
View history
Collection guide
Browse collections
×
🎉 Free Image Generator Now Available!
Totally Free + Zero Barriers + No Login Required
Visit Now