Update README.md
Browse files
README.md
CHANGED
|
@@ -27,13 +27,13 @@ pipeline_tag: text-generation
|
|
| 27 |
# NuMarkdown-reasoning 📄
|
| 28 |
|
| 29 |
**NuMarkdown-reasoning** is the first reasoning vision-language model trained specifically to convert documents into clean GitHub-flavoured Markdown.
|
| 30 |
-
It is a fine-tune of **Qwen 2.5-VL-7B** using ~
|
| 31 |
|
| 32 |
-
*(
|
| 33 |
|
| 34 |
## Results
|
| 35 |
|
| 36 |
-
**NuMarkdown-reasoning** is significantly better than similar size non-reasoning models trained for markdown generation on complex documents, and
|
| 37 |
|
| 38 |
### Arena ranking (using trueskill-2 ranking system):
|
| 39 |
<p align="center">
|
|
@@ -50,7 +50,7 @@ It is a fine-tune of **Qwen 2.5-VL-7B** using ~10 k synthetic doc-to-Reasoning-t
|
|
| 50 |
|
| 51 |
</p>
|
| 52 |
|
| 53 |
-
*
|
| 54 |
|
| 55 |
### Win-rate against others models (image-only):
|
| 56 |
<p align="center">
|
|
@@ -60,10 +60,10 @@ It is a fine-tune of **Qwen 2.5-VL-7B** using ~10 k synthetic doc-to-Reasoning-t
|
|
| 60 |
|
| 61 |
## Training
|
| 62 |
|
| 63 |
-
1. **SFT**: One-epoch supervised fine-
|
| 64 |
-
2. **RL (GRPO)**: RL phase using a layout-centric reward (5K
|
| 65 |
|
| 66 |
-
**Model before GRPO
|
| 67 |
|
| 68 |
|
| 69 |
## Quick start: 🤗 Transformers
|
|
@@ -107,7 +107,7 @@ print(processor.decode(out[0].split("<answer>")[1].split("</answer>")[0], skip_s
|
|
| 107 |
```
|
| 108 |
|
| 109 |
|
| 110 |
-
##
|
| 111 |
```python
|
| 112 |
from PIL import Image
|
| 113 |
from vllm import LLM, SamplingParams
|
|
|
|
| 27 |
# NuMarkdown-reasoning 📄
|
| 28 |
|
| 29 |
**NuMarkdown-reasoning** is the first reasoning vision-language model trained specifically to convert documents into clean GitHub-flavoured Markdown.
|
| 30 |
+
It is a fine-tune of **Qwen 2.5-VL-7B** using ~10k synthetic Doc-to-Reasoning-to-Markdown pairs, followed by an RL phase (GRPO) with a layout-centric reward.
|
| 31 |
|
| 32 |
+
*(Note: the number of thinking tokens can vary from 20% to 2X the number of tokens of the final answers)*
|
| 33 |
|
| 34 |
## Results
|
| 35 |
|
| 36 |
+
**NuMarkdown-reasoning** is significantly better than similar size non-reasoning models trained for markdown generation on complex documents, and achieves competitive results against top closed source alternatives.
|
| 37 |
|
| 38 |
### Arena ranking (using trueskill-2 ranking system):
|
| 39 |
<p align="center">
|
|
|
|
| 50 |
|
| 51 |
</p>
|
| 52 |
|
| 53 |
+
*We plan to realease a markdown arena, similar to llmArena, for complex document-to-markdown tasks to help evaluate different document to markdown solutions.*
|
| 54 |
|
| 55 |
### Win-rate against others models (image-only):
|
| 56 |
<p align="center">
|
|
|
|
| 60 |
|
| 61 |
## Training
|
| 62 |
|
| 63 |
+
1. **SFT**: One-epoch supervised fine-tuning on synthetic reasoning traces generated from public PDFs (10K input/output pairs).
|
| 64 |
+
2. **RL (GRPO)**: RL phase using a layout-centric reward (5K difficult image examples).
|
| 65 |
|
| 66 |
+
**Model before GRPO loses 80% time vs post-GRPO model (see win-rate matrix)**
|
| 67 |
|
| 68 |
|
| 69 |
## Quick start: 🤗 Transformers
|
|
|
|
| 107 |
```
|
| 108 |
|
| 109 |
|
| 110 |
+
## vLLM:
|
| 111 |
```python
|
| 112 |
from PIL import Image
|
| 113 |
from vllm import LLM, SamplingParams
|