Alexandre-Numind commited on
Commit
96ce026
·
verified ·
1 Parent(s): 6636f1f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -8
README.md CHANGED
@@ -21,21 +21,34 @@ pipeline_tag: text-generation
21
  **NuMarkdown-Qwen2.5-VL** is the first reasoning vision-language model trained to converts documents into clean GitHub-flavoured Markdown.
22
  It is a lightweight fine-tune of **Qwen 2.5-VL-7B** using ~10 k synthetic doc-to-Markdown pairs, followed by a RL phase (GRPO) with a layout-centric reward.
23
 
24
- By increasing the output length by 10% to 20%, the model outperform model of it's size and is competitive with top close source reasoning model
25
 
26
  ---
27
  ## Results
28
 
29
  (we plan to realease a markdown arena -similar to llmArena- for complex table to markdown format)
30
 
31
- Winrate of our model vs open source alternative:
32
 
33
- //
 
 
 
 
 
 
 
 
34
 
35
 
36
- Winrate vs close source alternative:
 
 
 
 
 
 
37
 
38
- //
39
 
40
  ---
41
 
@@ -82,9 +95,9 @@ prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_
82
  enc = processor(text=prompt, images=[img], return_tensors="pt").to(model.device)
83
 
84
  with torch.no_grad():
85
- out = model.generate(**enc, max_new_tokens=1024)
86
 
87
- print(processor.decode(out[0], skip_special_tokens=True))
88
  ```
89
 
90
 
@@ -103,6 +116,6 @@ prompt = proc(text="Convert this to Markdown with reasoning.", image=img,
103
  return_tensors="np") # numpy arrays for vLLM
104
 
105
  params = SamplingParameters(max_tokens=1024, temperature=0.8, top_p=0.95)
106
- result = llm.generate([{"prompt": prompt}], params)[0].outputs[0].text
107
  print(result)
108
  ```
 
21
  **NuMarkdown-Qwen2.5-VL** is the first reasoning vision-language model trained to converts documents into clean GitHub-flavoured Markdown.
22
  It is a lightweight fine-tune of **Qwen 2.5-VL-7B** using ~10 k synthetic doc-to-Markdown pairs, followed by a RL phase (GRPO) with a layout-centric reward.
23
 
24
+ (note: the number of thinking tokens can vary from 20% to 2X the number of token of the final answers)
25
 
26
  ---
27
  ## Results
28
 
29
  (we plan to realease a markdown arena -similar to llmArena- for complex table to markdown format)
30
 
31
+ ### Arena ranking (using trueskill-2 ranking system)
32
 
33
+ | Rank | Model | μ | σ | μ − 3σ |
34
+ | ---- | --------------------------------------- | ----- | ---- | ------ |
35
+ | 🥇 1 | **gemini-flash-reasoning** | 26.75 | 0.80 | 24.35 |
36
+ | 🥈 2 | **NuMarkdown-reasoning** | 26.10 | 0.79 | 23.72 |
37
+ | 🥉 3 | **NuMarkdown-reasoning-w/o\_reasoning** | 25.32 | 0.80 | 22.93 |
38
+ | 4 | **OCRFlux-3B** | 24.63 | 0.80 | 22.22 |
39
+ | 5 | **gpt-4o** | 24.48 | 0.80 | 22.08 |
40
+ | 6 | **gemini-flash-w/o\_reasoning** | 24.11 | 0.79 | 21.74 |
41
+ | 7 | **RolmoOCR** | 23.53 | 0.82 | 21.07 |
42
 
43
 
44
+ ### Win-rate of our model against others models:
45
+
46
+ <img src="bar plot.png" width="500"/>
47
+
48
+ ### Matrix Win-rate:
49
+
50
+ <img src="matrix.png" width="500"/>
51
 
 
52
 
53
  ---
54
 
 
95
  enc = processor(text=prompt, images=[img], return_tensors="pt").to(model.device)
96
 
97
  with torch.no_grad():
98
+ out = model.generate(**enc, max_new_tokens=5000)
99
 
100
+ print(processor.decode(out[0].split("<answer>")[1].split("</answer>")[0], skip_special_tokens=True))
101
  ```
102
 
103
 
 
116
  return_tensors="np") # numpy arrays for vLLM
117
 
118
  params = SamplingParameters(max_tokens=1024, temperature=0.8, top_p=0.95)
119
+ result = llm.generate([{"prompt": prompt}], params)[0].outputs[0].text.split("<answer>")[1].split("</answer>")[0]
120
  print(result)
121
  ```