liamcripwell commited on
Commit
24625a0
·
verified ·
1 Parent(s): 6345902

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -29,13 +29,13 @@ pipeline_tag: text-generation
29
  **NuMarkdown-reasoning** is the first reasoning vision-language model trained specifically to convert documents into clean GitHub-flavoured Markdown.
30
  It is a fine-tune of **Qwen 2.5-VL-7B** using ~10k synthetic Doc-to-Reasoning-to-Markdown pairs, followed by an RL phase (GRPO) with a layout-centric reward.
31
 
32
- *(Note: the number of thinking tokens can vary from 20% to 5X the number of tokens of the final answers)*
33
 
34
  ## Results
35
 
36
  **NuMarkdown-reasoning** is significantly better than similar size non-reasoning models trained for markdown generation on complex documents, and achieves competitive results against top closed source alternatives.
37
 
38
- ### Arena ranking agains popular alternative (using trueskill-2 ranking system, with around 500 votes):
39
  <p align="center">
40
 
41
  | Rank | Model | μ | σ | μ − 3σ |
@@ -52,7 +52,7 @@ It is a fine-tune of **Qwen 2.5-VL-7B** using ~10k synthetic Doc-to-Reasoning-to
52
 
53
  *We plan to realease a markdown arena, similar to llmArena, for complex document-to-markdown tasks to provide a tool to evaluate different solutions.*
54
 
55
- ### Win/Draw/Loose-rate against others models (image-only):
56
  <p align="center">
57
  <img src="bar plot.png" width="700"/>
58
  </p>
@@ -60,7 +60,7 @@ It is a fine-tune of **Qwen 2.5-VL-7B** using ~10k synthetic Doc-to-Reasoning-to
60
 
61
  ## Training
62
 
63
- 1. **SFT**: One-epoch supervised fine-tuning on synthetic reasoning traces generated from public PDFs (10K input/output pairs).
64
  2. **RL (GRPO)**: RL phase using a layout-centric reward (5K difficult image examples).
65
 
66
 
 
29
  **NuMarkdown-reasoning** is the first reasoning vision-language model trained specifically to convert documents into clean GitHub-flavoured Markdown.
30
  It is a fine-tune of **Qwen 2.5-VL-7B** using ~10k synthetic Doc-to-Reasoning-to-Markdown pairs, followed by an RL phase (GRPO) with a layout-centric reward.
31
 
32
+ *(Note: the number of thinking tokens can vary from 20% to 500% the number of tokens in the final answer)*
33
 
34
  ## Results
35
 
36
  **NuMarkdown-reasoning** is significantly better than similar size non-reasoning models trained for markdown generation on complex documents, and achieves competitive results against top closed source alternatives.
37
 
38
+ ### Arena ranking against popular alternatives (using trueskill-2 ranking system, with around 500 anonymized votes):
39
  <p align="center">
40
 
41
  | Rank | Model | μ | σ | μ − 3σ |
 
52
 
53
  *We plan to realease a markdown arena, similar to llmArena, for complex document-to-markdown tasks to provide a tool to evaluate different solutions.*
54
 
55
+ ### Win/Draw/Lose-rate against others models (image-only):
56
  <p align="center">
57
  <img src="bar plot.png" width="700"/>
58
  </p>
 
60
 
61
  ## Training
62
 
63
+ 1. **SFT**: Single epoch supervised fine-tuning on synthetic reasoning traces generated from public PDFs (10K input/output pairs).
64
  2. **RL (GRPO)**: RL phase using a layout-centric reward (5K difficult image examples).
65
 
66