omrisap commited on
Commit
3aff42c
·
verified ·
1 Parent(s): f94103e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -28,7 +28,7 @@ A 1.5B parameter math reasoning model fine-tuned with **TreeRPO**, a hierarchica
28
 
29
  ## Model Details
30
  - **Base model:** [`Qwen/Qwen2.5-Math-1.5B`](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B)
31
- - **Method:** TreeRPO (tree-structured GRPO; up to depth 7; branching by entropy & length)
32
  - **Reward signal:** Deterministic exact-match checker (binary). Interior node rewards = mean descendant leaf rewards.
33
  - **Domain:** Grade-school and intermediate math word problems (GSM8K style)
34
 
@@ -45,7 +45,7 @@ Open-ended or unsafe dialog, general factual QA, or high-stakes applications.
45
  | Model | Greedy (%) | Maj@8 (%) | Notes |
46
  |---------------------------------|------------|-----------|--------------------------------------|
47
  | Qwen2.5-Math-1.5B-Instruct | 84.8 | 89.5 | Reported settings |
48
- | **TreeRPO-Qwen2.5-Math-1.5B** | **86.4** | **89.6** | Same decoding (temp 0 / (0.7, 0.8)) |
49
 
50
  - **Greedy:** temperature = 0 (deterministic)
51
  - **Maj@8:** 8 completions (temperature 0.7, top-p 0.8); majority vote on final boxed answer
 
28
 
29
  ## Model Details
30
  - **Base model:** [`Qwen/Qwen2.5-Math-1.5B`](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B)
31
+ - **Method:** TreeRPO (tree-structured GRPO;)
32
  - **Reward signal:** Deterministic exact-match checker (binary). Interior node rewards = mean descendant leaf rewards.
33
  - **Domain:** Grade-school and intermediate math word problems (GSM8K style)
34
 
 
45
  | Model | Greedy (%) | Maj@8 (%) | Notes |
46
  |---------------------------------|------------|-----------|--------------------------------------|
47
  | Qwen2.5-Math-1.5B-Instruct | 84.8 | 89.5 | Reported settings |
48
+ | **Qwen2.5-Math-1.5B-TreeRPO** | **86.4** | **89.6** | Same decoding (temp 0 / (0.7, 0.8)) |
49
 
50
  - **Greedy:** temperature = 0 (deterministic)
51
  - **Maj@8:** 8 completions (temperature 0.7, top-p 0.8); majority vote on final boxed answer