Alexandre-Numind commited on
Commit
42ed8bc
·
verified ·
1 Parent(s): 30514bd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -64,7 +64,9 @@ It is a fine-tune of **Qwen 2.5-VL-7B** using ~10 k synthetic doc-to-Reasoning-t
64
  ## Training
65
 
66
  1. **SFT**: One-epoch supervised fine-tune on synthetic reasoning trace generated from public PDFs (10K input/output pairs).
67
- 2. **RL (GRPO)**: RL pahse using a structure-aware reward (5K difficults image examples).
 
 
68
 
69
 
70
  ## Quick start: 🤗 Transformers
 
64
  ## Training
65
 
66
  1. **SFT**: One-epoch supervised fine-tune on synthetic reasoning trace generated from public PDFs (10K input/output pairs).
67
+ 2. **RL (GRPO)**: RL pahse using a structure-aware reward (5K difficults image examples).
68
+
69
+ **Model before GRPO loose 80% time vs post GRPO model**
70
 
71
 
72
  ## Quick start: 🤗 Transformers