Update README.md
Browse files
README.md
CHANGED
@@ -36,11 +36,32 @@ Built upon a 8B dense language model (Qwen3) and a 400M Vision encoder (InternVi
|
|
36 |
|
37 |
## Performance
|
38 |
|
39 |
-
We evaluate the Intern-S1-mini on various benchmarks including general datasets and
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
|
41 |
|
42 |
-
> **Note**: ✅ means the best performance among open-sourced models, 👑 indicates the best performance among all models.
|
43 |
-
|
44 |
We use the [OpenCompass](https://github.com/open-compass/OpenCompass/) and [VLMEvalkit](https://github.com/open-compass/vlmevalkit) to evaluate all models.
|
45 |
|
46 |
|
|
|
36 |
|
37 |
## Performance
|
38 |
|
39 |
+
We evaluate the Intern-S1-mini on various benchmarks including general datasets and scientific datasets. We report the performance comparison with the recent VLMs and LLMs below.
|
40 |
+
|
41 |
+
|
42 |
+
| | | Intern-S1-mini | Qwen3-8B | GLM-4.1V | MiMo-VL-7B-RL-2508 |
|
43 |
+
|------------|----------------|-------------------|----------|----------|--------------------|
|
44 |
+
| General | MMLU-Pro | **74.78** | 73.7 | 57.1 | 73.93 |
|
45 |
+
| | MMMU | **72.33** | N/A | 69.9 | 70.4 |
|
46 |
+
| | MMStar | 65.2 | N/A | 71.5 | 72.9 |
|
47 |
+
| | GPQA | **65.15** | 62 | 50.32 | 60.35 |
|
48 |
+
| | AIME2024 | **84.58** | 76 | 36.2 | 72.6 |
|
49 |
+
| | AIME2025 | **80** | 67.3 | 32 | 64.4 |
|
50 |
+
| | MathVision | 51.41 | N/A | 53.9 | 54.5 |
|
51 |
+
| | MathVista | 70.3 | N/A | 80.7 | 79.4 |
|
52 |
+
| | IFEval | 81.15 | 85 | 71.53 | 71.4 |
|
53 |
+
| | | | | | |
|
54 |
+
| Scientific | SFE | 35.84 | N/A | 43.2 | 43.9 |
|
55 |
+
| | Physics | **28.76** | N/A | 4.3 | 23.9 |
|
56 |
+
| | SmolInstruct | **32.2** | 17.6 | 18.1 | 16.11 |
|
57 |
+
| | ChemBench | **76.47** | 61.1 | 56.2 | 66.78 |
|
58 |
+
| | MatBench | **61.55** | 45.24 | 54.3 | 46.9 |
|
59 |
+
| | MicroVQA | **56.62** | N/A | 50.2 | 50.96 |
|
60 |
+
| | ProteinLMBench | 58.47 | 59.1 | 58.3 | 59.8 |
|
61 |
+
| | MSEarthMCQ | **58.12** | N/A | 50.3 | 47.3 |
|
62 |
+
| | XLRS-Bench | **51.63** | N/A | 49.8 | 12.29 |
|
63 |
|
64 |
|
|
|
|
|
65 |
We use the [OpenCompass](https://github.com/open-compass/OpenCompass/) and [VLMEvalkit](https://github.com/open-compass/vlmevalkit) to evaluate all models.
|
66 |
|
67 |
|