Update README.md

Browse files

Files changed (1) hide show

README.md +21 -8

README.md CHANGED Viewed

@@ -21,21 +21,34 @@ pipeline_tag: text-generation
 **NuMarkdown-Qwen2.5-VL** is the first reasoning vision-language model trained to converts documents into clean GitHub-flavoured Markdown.
 It is a lightweight fine-tune of **Qwen 2.5-VL-7B** using ~10 k synthetic doc-to-Markdown pairs, followed by a RL phase (GRPO) with a layout-centric reward.
-By increasing the output length by 10% to 20%, the model outperform model of it's size and is competitive with top close source reasoning model
 ---
 ## Results
 (we plan to realease a markdown arena -similar to llmArena- for complex table to markdown format)
-Winrate of our model vs open source alternative:
-//
-Winrate vs close source alternative:
-//
 ---
@@ -82,9 +95,9 @@ prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_
 enc = processor(text=prompt, images=[img], return_tensors="pt").to(model.device)
 with torch.no_grad():
-    out = model.generate(**enc, max_new_tokens=1024)
-print(processor.decode(out[0], skip_special_tokens=True))
 ```
@@ -103,6 +116,6 @@ prompt = proc(text="Convert this to Markdown with reasoning.", image=img,
               return_tensors="np")  # numpy arrays for vLLM
 params = SamplingParameters(max_tokens=1024, temperature=0.8, top_p=0.95)
-result = llm.generate([{"prompt": prompt}], params)[0].outputs[0].text
 print(result)
 ```

 **NuMarkdown-Qwen2.5-VL** is the first reasoning vision-language model trained to converts documents into clean GitHub-flavoured Markdown.
 It is a lightweight fine-tune of **Qwen 2.5-VL-7B** using ~10 k synthetic doc-to-Markdown pairs, followed by a RL phase (GRPO) with a layout-centric reward.
+(note: the number of thinking tokens can vary from 20% to 2X the number of token of the final answers)
 ---
 ## Results
 (we plan to realease a markdown arena -similar to llmArena- for complex table to markdown format)
+### Arena ranking (using trueskill-2 ranking system)
+| Rank | Model                                   | μ     | σ    | μ − 3σ |
+| ---- | --------------------------------------- | ----- | ---- | ------ |
+| 🥇 1 | **gemini-flash-reasoning**              | 26.75 | 0.80 | 24.35  |
+| 🥈 2 | **NuMarkdown-reasoning**                | 26.10 | 0.79 | 23.72  |
+| 🥉 3 | **NuMarkdown-reasoning-w/o\_reasoning** | 25.32 | 0.80 | 22.93  |
+| 4    | **OCRFlux-3B**                          | 24.63 | 0.80 | 22.22  |
+| 5    | **gpt-4o**                              | 24.48 | 0.80 | 22.08  |
+| 6    | **gemini-flash-w/o\_reasoning**         | 24.11 | 0.79 | 21.74  |
+| 7    | **RolmoOCR**                            | 23.53 | 0.82 | 21.07  |
+### Win-rate of our model against others models:
+<img src="bar plot.png" width="500"/>
+### Matrix Win-rate:
+<img src="matrix.png" width="500"/>
 ---
 enc = processor(text=prompt, images=[img], return_tensors="pt").to(model.device)
 with torch.no_grad():
+    out = model.generate(**enc, max_new_tokens=5000)
+print(processor.decode(out[0].split("<answer>")[1].split("</answer>")[0], skip_special_tokens=True))
 ```
               return_tensors="np")  # numpy arrays for vLLM
 params = SamplingParameters(max_tokens=1024, temperature=0.8, top_p=0.95)
+result = llm.generate([{"prompt": prompt}], params)[0].outputs[0].text.split("<answer>")[1].split("</answer>")[0]
 print(result)
 ```