RangiLyu commited on
Commit
12e6d31
·
verified ·
1 Parent(s): af13f02

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -3
README.md CHANGED
@@ -36,11 +36,32 @@ Built upon a 8B dense language model (Qwen3) and a 400M Vision encoder (InternVi
36
 
37
  ## Performance
38
 
39
- We evaluate the Intern-S1-mini on various benchmarks including general datasets and scientifc datasets. We report the performance comparsion with the recent VLMs and LLMs below.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
 
42
- > **Note**: ✅ means the best performance among open-sourced models, 👑 indicates the best performance among all models.
43
-
44
  We use the [OpenCompass](https://github.com/open-compass/OpenCompass/) and [VLMEvalkit](https://github.com/open-compass/vlmevalkit) to evaluate all models.
45
 
46
 
 
36
 
37
  ## Performance
38
 
39
+ We evaluate the Intern-S1-mini on various benchmarks including general datasets and scientific datasets. We report the performance comparison with the recent VLMs and LLMs below.
40
+
41
+
42
+ | | | Intern-S1-mini | Qwen3-8B | GLM-4.1V | MiMo-VL-7B-RL-2508 |
43
+ |------------|----------------|-------------------|----------|----------|--------------------|
44
+ | General | MMLU-Pro | **74.78** | 73.7 | 57.1 | 73.93 |
45
+ |   | MMMU | **72.33** | N/A | 69.9 | 70.4 |
46
+ |   | MMStar | 65.2 | N/A | 71.5 | 72.9 |
47
+ |   | GPQA | **65.15** | 62 | 50.32 | 60.35 |
48
+ |   | AIME2024 | **84.58** | 76 | 36.2 | 72.6 |
49
+ |   | AIME2025 | **80** | 67.3 | 32 | 64.4 |
50
+ |   | MathVision | 51.41 | N/A | 53.9 | 54.5 |
51
+ |   | MathVista | 70.3 | N/A | 80.7 | 79.4 |
52
+ |   | IFEval | 81.15 | 85 | 71.53 | 71.4 |
53
+ | | | | | | |
54
+ | Scientific | SFE | 35.84 | N/A | 43.2 | 43.9 |
55
+ |   | Physics | **28.76** | N/A | 4.3 | 23.9 |
56
+ |   | SmolInstruct | **32.2** | 17.6 | 18.1 | 16.11 |
57
+ |   | ChemBench | **76.47** | 61.1 | 56.2 | 66.78 |
58
+ |   | MatBench | **61.55** | 45.24 | 54.3 | 46.9 |
59
+ |   | MicroVQA | **56.62** | N/A | 50.2 | 50.96 |
60
+ |   | ProteinLMBench | 58.47 | 59.1 | 58.3 | 59.8 |
61
+ |   | MSEarthMCQ | **58.12** | N/A | 50.3 | 47.3 |
62
+ |   | XLRS-Bench | **51.63** | N/A | 49.8 | 12.29 |
63
 
64
 
 
 
65
  We use the [OpenCompass](https://github.com/open-compass/OpenCompass/) and [VLMEvalkit](https://github.com/open-compass/vlmevalkit) to evaluate all models.
66
 
67