Update README.md
Browse files
README.md
CHANGED
|
@@ -105,7 +105,7 @@ In this work, we use the Best-of-N evaluation strategy and employ [VisualPRM-8B]
|
|
| 105 |
|
| 106 |
### Multimodal Reasoning and Mathematics
|
| 107 |
|
| 108 |
-

|
| 109 |
|
| 110 |
### OCR, Chart, and Document Understanding
|
| 111 |
|
|
|
|
| 161 |
|
| 162 |
As shown in the table below, models fine-tuned with MPO demonstrate superior reasoning performance across seven multimodal reasoning benchmarks compared to their counterparts without MPO. Specifically, InternVL3-78B and InternVL3-38B outperform their counterparts by 4.1 and 4.5 points, respectively. Notably, the training data used for MPO is a subset of that used for SFT, indicating that the performance improvements primarily stem from the training algorithm rather than the training data.
|
| 163 |
|
| 164 |
+

|
| 165 |
|
| 166 |
### Variable Visual Position Encoding
|
| 167 |
|