Update README.md
Browse files
README.md
CHANGED
@@ -29,13 +29,13 @@ pipeline_tag: text-generation
|
|
29 |
**NuMarkdown-reasoning** is the first reasoning vision-language model trained specifically to convert documents into clean GitHub-flavoured Markdown.
|
30 |
It is a fine-tune of **Qwen 2.5-VL-7B** using ~10k synthetic Doc-to-Reasoning-to-Markdown pairs, followed by an RL phase (GRPO) with a layout-centric reward.
|
31 |
|
32 |
-
*(Note: the number of thinking tokens can vary from 20% to
|
33 |
|
34 |
## Results
|
35 |
|
36 |
**NuMarkdown-reasoning** is significantly better than similar size non-reasoning models trained for markdown generation on complex documents, and achieves competitive results against top closed source alternatives.
|
37 |
|
38 |
-
### Arena ranking
|
39 |
<p align="center">
|
40 |
|
41 |
| Rank | Model | μ | σ | μ − 3σ |
|
@@ -52,7 +52,7 @@ It is a fine-tune of **Qwen 2.5-VL-7B** using ~10k synthetic Doc-to-Reasoning-to
|
|
52 |
|
53 |
*We plan to realease a markdown arena, similar to llmArena, for complex document-to-markdown tasks to provide a tool to evaluate different solutions.*
|
54 |
|
55 |
-
### Win/Draw/
|
56 |
<p align="center">
|
57 |
<img src="bar plot.png" width="700"/>
|
58 |
</p>
|
@@ -60,7 +60,7 @@ It is a fine-tune of **Qwen 2.5-VL-7B** using ~10k synthetic Doc-to-Reasoning-to
|
|
60 |
|
61 |
## Training
|
62 |
|
63 |
-
1. **SFT**:
|
64 |
2. **RL (GRPO)**: RL phase using a layout-centric reward (5K difficult image examples).
|
65 |
|
66 |
|
|
|
29 |
**NuMarkdown-reasoning** is the first reasoning vision-language model trained specifically to convert documents into clean GitHub-flavoured Markdown.
|
30 |
It is a fine-tune of **Qwen 2.5-VL-7B** using ~10k synthetic Doc-to-Reasoning-to-Markdown pairs, followed by an RL phase (GRPO) with a layout-centric reward.
|
31 |
|
32 |
+
*(Note: the number of thinking tokens can vary from 20% to 500% the number of tokens in the final answer)*
|
33 |
|
34 |
## Results
|
35 |
|
36 |
**NuMarkdown-reasoning** is significantly better than similar size non-reasoning models trained for markdown generation on complex documents, and achieves competitive results against top closed source alternatives.
|
37 |
|
38 |
+
### Arena ranking against popular alternatives (using trueskill-2 ranking system, with around 500 anonymized votes):
|
39 |
<p align="center">
|
40 |
|
41 |
| Rank | Model | μ | σ | μ − 3σ |
|
|
|
52 |
|
53 |
*We plan to realease a markdown arena, similar to llmArena, for complex document-to-markdown tasks to provide a tool to evaluate different solutions.*
|
54 |
|
55 |
+
### Win/Draw/Lose-rate against others models (image-only):
|
56 |
<p align="center">
|
57 |
<img src="bar plot.png" width="700"/>
|
58 |
</p>
|
|
|
60 |
|
61 |
## Training
|
62 |
|
63 |
+
1. **SFT**: Single epoch supervised fine-tuning on synthetic reasoning traces generated from public PDFs (10K input/output pairs).
|
64 |
2. **RL (GRPO)**: RL phase using a layout-centric reward (5K difficult image examples).
|
65 |
|
66 |
|