liamcripwell commited on
Commit
024704b
·
verified ·
1 Parent(s): aff8d50

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -27,13 +27,13 @@ pipeline_tag: text-generation
27
  # NuMarkdown-reasoning 📄
28
 
29
  **NuMarkdown-reasoning** is the first reasoning vision-language model trained specifically to convert documents into clean GitHub-flavoured Markdown.
30
- It is a fine-tune of **Qwen 2.5-VL-7B** using ~10 k synthetic doc-to-Reasoning-to-Markdown pairs, followed by a RL phase (GRPO) with a layout-centric reward.
31
 
32
- *(note: the number of thinking tokens can vary from 20% to 2X the number of token of the final answers)*
33
 
34
  ## Results
35
 
36
- **NuMarkdown-reasoning** is significantly better than similar size non-reasoning models trained for markdown generation on complex documents, and achieve competitive results against top close sources alternatives.
37
 
38
  ### Arena ranking (using trueskill-2 ranking system):
39
  <p align="center">
@@ -50,7 +50,7 @@ It is a fine-tune of **Qwen 2.5-VL-7B** using ~10 k synthetic doc-to-Reasoning-t
50
 
51
  </p>
52
 
53
- *we plan to realease a markdown arena, similar to llmArena, for complex document to markdown task to help evaluate different document to markdown solution*
54
 
55
  ### Win-rate against others models (image-only):
56
  <p align="center">
@@ -60,10 +60,10 @@ It is a fine-tune of **Qwen 2.5-VL-7B** using ~10 k synthetic doc-to-Reasoning-t
60
 
61
  ## Training
62
 
63
- 1. **SFT**: One-epoch supervised fine-tune on synthetic reasoning trace generated from public PDFs (10K input/output pairs).
64
- 2. **RL (GRPO)**: RL phase using a layout-centric reward (5K difficults image examples).
65
 
66
- **Model before GRPO loose 80% time vs post GRPO model (see win-rate matrix)**
67
 
68
 
69
  ## Quick start: 🤗 Transformers
@@ -107,7 +107,7 @@ print(processor.decode(out[0].split("<answer>")[1].split("</answer>")[0], skip_s
107
  ```
108
 
109
 
110
- ## VLLM:
111
  ```python
112
  from PIL import Image
113
  from vllm import LLM, SamplingParams
 
27
  # NuMarkdown-reasoning 📄
28
 
29
  **NuMarkdown-reasoning** is the first reasoning vision-language model trained specifically to convert documents into clean GitHub-flavoured Markdown.
30
+ It is a fine-tune of **Qwen 2.5-VL-7B** using ~10k synthetic Doc-to-Reasoning-to-Markdown pairs, followed by an RL phase (GRPO) with a layout-centric reward.
31
 
32
+ *(Note: the number of thinking tokens can vary from 20% to 2X the number of tokens of the final answers)*
33
 
34
  ## Results
35
 
36
+ **NuMarkdown-reasoning** is significantly better than similar size non-reasoning models trained for markdown generation on complex documents, and achieves competitive results against top closed source alternatives.
37
 
38
  ### Arena ranking (using trueskill-2 ranking system):
39
  <p align="center">
 
50
 
51
  </p>
52
 
53
+ *We plan to realease a markdown arena, similar to llmArena, for complex document-to-markdown tasks to help evaluate different document to markdown solutions.*
54
 
55
  ### Win-rate against others models (image-only):
56
  <p align="center">
 
60
 
61
  ## Training
62
 
63
+ 1. **SFT**: One-epoch supervised fine-tuning on synthetic reasoning traces generated from public PDFs (10K input/output pairs).
64
+ 2. **RL (GRPO)**: RL phase using a layout-centric reward (5K difficult image examples).
65
 
66
+ **Model before GRPO loses 80% time vs post-GRPO model (see win-rate matrix)**
67
 
68
 
69
  ## Quick start: 🤗 Transformers
 
107
  ```
108
 
109
 
110
+ ## vLLM:
111
  ```python
112
  from PIL import Image
113
  from vllm import LLM, SamplingParams