omrisap commited on
Commit
f94103e
·
verified ·
1 Parent(s): 76767fe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -22,7 +22,7 @@ model_name: TreeRPO-Qwen2.5-Math-1.5B
22
  A 1.5B parameter math reasoning model fine-tuned with **TreeRPO**, a hierarchical extension of GRPO that assigns rewards to “thought” nodes (not just full completions). Achieves higher GSM8K accuracy with just ~10K supervised + RL examples and **no reward model**.
23
 
24
  🔎 **Full write-up (method, math, analysis):**
25
- [TreeRPO: Hierarchical Credit Assignment for Data-Efficient Math Reasoning](https://omrisapir.substack.com/publish/post/167273414)
26
 
27
  ---
28
 
 
22
  A 1.5B parameter math reasoning model fine-tuned with **TreeRPO**, a hierarchical extension of GRPO that assigns rewards to “thought” nodes (not just full completions). Achieves higher GSM8K accuracy with just ~10K supervised + RL examples and **no reward model**.
23
 
24
  🔎 **Full write-up (method, math, analysis):**
25
+ [TreeRPO: Hierarchical Credit Assignment for Reasoning in Language Models](https://omrisapir.substack.com/publish/post/167273414)
26
 
27
  ---
28