Update README.md
Browse files
README.md
CHANGED
@@ -1,13 +1,17 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
pipeline_tag: image-text-to-text
|
4 |
-
---
|
5 |
|
6 |
|
7 |
## Intern-S1
|
8 |
|
9 |

|
10 |
|
|
|
|
|
|
|
|
|
11 |
## Introduction
|
12 |
|
13 |
We introduce **Intern-S1**, our **most advanced open-source multimodal reasoning model** to date. Intern-S1 combines **strong general-task capabilities with state-of-the-art performance on a wide range of scientific tasks**, rivaling leading closed-source commercial models.
|
@@ -24,7 +28,43 @@ Features
|
|
24 |
|
25 |
We evaluate the Intern-S1 on various benchmarks including general datasets and scientifc datasets. We report the performance comparsion with the recent VLMs and LLMs below.
|
26 |
|
27 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
|
29 |
We use the [OpenCompass](https://github.com/open-compass/OpenCompass/) and [VLMEvalkit](https://github.com/open-compass/vlmevalkit) to evaluate all models.
|
30 |
|
@@ -74,7 +114,7 @@ decoded_output = processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :
|
|
74 |
print(decoded_output)
|
75 |
```
|
76 |
|
77 |
-
####
|
78 |
|
79 |
```python
|
80 |
from transformers import AutoProcessor, AutoModelForCausalLM
|
@@ -156,11 +196,14 @@ Coming soon.
|
|
156 |
|
157 |
#### [sglang](https://github.com/sgl-project/sglang)
|
158 |
|
|
|
|
|
159 |
```bash
|
160 |
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
|
161 |
python3 -m sglang.launch_server \
|
162 |
--model-path internlm/Intern-S1 \
|
163 |
--trust-remote-code \
|
|
|
164 |
--tp 8 \
|
165 |
--enable-multimodal \
|
166 |
--grammar-backend none
|
@@ -172,9 +215,9 @@ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
|
|
172 |
# install ollama
|
173 |
curl -fsSL https://ollama.com/install.sh | sh
|
174 |
# fetch model
|
175 |
-
ollama pull internlm/
|
176 |
# run model
|
177 |
-
ollama run internlm/
|
178 |
# then use openai client to call on http://localhost:11434/v1
|
179 |
```
|
180 |
|
@@ -186,9 +229,10 @@ Many Large Language Models (LLMs) now feature **Tool Calling**, a powerful capab
|
|
186 |
|
187 |
A key advantage for developers is that a growing number of open-source LLMs are designed to be compatible with the OpenAI API. This means you can leverage the same familiar syntax and structure from the OpenAI library to implement tool calling with these open-source models. As a result, the code demonstrated in this tutorial is versatile—it works not just with OpenAI models, but with any model that follows the same interface standard.
|
188 |
|
189 |
-
To illustrate how this works, let's dive into a practical code example that uses tool calling to get the latest weather forecast.
|
190 |
|
191 |
```python
|
|
|
192 |
from openai import OpenAI
|
193 |
import json
|
194 |
|
@@ -313,7 +357,7 @@ response = client.chat.completions.create(
|
|
313 |
temperature=0.8,
|
314 |
top_p=0.8,
|
315 |
stream=False,
|
316 |
-
extra_body=dict(spaces_between_special_tokens=False),
|
317 |
tools=tools)
|
318 |
print(response.choices[0].message)
|
319 |
messages.append(response.choices[0].message)
|
@@ -335,11 +379,10 @@ response = client.chat.completions.create(
|
|
335 |
temperature=0.8,
|
336 |
top_p=0.8,
|
337 |
stream=False,
|
338 |
-
extra_body=dict(spaces_between_special_tokens=False),
|
339 |
tools=tools)
|
340 |
print(response.choices[0].message.content)
|
341 |
```
|
342 |
-
|
343 |
|
344 |
### Switching Between Thinking and Non-Thinking Modes
|
345 |
|
@@ -400,4 +443,4 @@ For vllm and sglang users, configure this through,
|
|
400 |
extra_body={
|
401 |
"chat_template_kwargs": {"enable_thinking": false}
|
402 |
}
|
403 |
-
```
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
pipeline_tag: image-text-to-text
|
4 |
+
---
|
5 |
|
6 |
|
7 |
## Intern-S1
|
8 |
|
9 |

|
10 |
|
11 |
+
|
12 |
+
[](https://github.com/InternLM/Intern-S1)
|
13 |
+
|
14 |
+
|
15 |
## Introduction
|
16 |
|
17 |
We introduce **Intern-S1**, our **most advanced open-source multimodal reasoning model** to date. Intern-S1 combines **strong general-task capabilities with state-of-the-art performance on a wide range of scientific tasks**, rivaling leading closed-source commercial models.
|
|
|
28 |
|
29 |
We evaluate the Intern-S1 on various benchmarks including general datasets and scientifc datasets. We report the performance comparsion with the recent VLMs and LLMs below.
|
30 |
|
31 |
+
<table>
|
32 |
+
<thead>
|
33 |
+
<tr>
|
34 |
+
<th rowspan="2">Benchmarks</th>
|
35 |
+
<th colspan="2">Intern-S1</th>
|
36 |
+
<th>InternVL3-78B</th>
|
37 |
+
<th>Qwen2.5-VL-72B</th>
|
38 |
+
<th>DS-R1-0528</th>
|
39 |
+
<th>Qwen3-235B-A2.2B</th>
|
40 |
+
<th>Kimi-K2-Instruct</th>
|
41 |
+
<th>Gemini-2.5 Pro</th>
|
42 |
+
<th>o3</th>
|
43 |
+
<th>Grok-4</th>
|
44 |
+
</tr>
|
45 |
+
</thead>
|
46 |
+
<tbody>
|
47 |
+
<tr><td>MMUL-Pro</td><td colspan="2">83.5 ✅</td><td>73.0</td><td>72.1</td><td>83.4</td><td>82.2</td><td>82.7</td><td>86.0</td><td>85.0</td><td>85.9</td></tr>
|
48 |
+
<tr><td>MMMU</td><td colspan="2">77.7 ✅</td><td>72.2</td><td>70.2</td><td>-</td><td>-</td><td>-</td><td>81.9</td><td>80.8</td><td>77.9</td></tr>
|
49 |
+
<tr><td>GPQA</td><td colspan="2">77.3</td><td>49.9</td><td>49.0</td><td>80.6</td><td>71.1</td><td>77.8</td><td>83.8</td><td>83.3</td><td>87.5</td></tr>
|
50 |
+
<tr><td>MMStar</td><td colspan="2">74.9 ✅</td><td>72.5</td><td>70.8</td><td>-</td><td>-</td><td>-</td><td>79.3</td><td>75.1</td><td>69.6</td></tr>
|
51 |
+
<tr><td>MathVista</td><td colspan="2">81.5 👑</td><td>79.0</td><td>74.8</td><td>-</td><td>-</td><td>-</td><td>80.3</td><td>77.5</td><td>72.5</td></tr>
|
52 |
+
<tr><td>AIME2025</td><td colspan="2">86.0</td><td>10.7</td><td>10.9</td><td>87.5</td><td>81.5</td><td>51.4</td><td>83.0</td><td>88.9</td><td>91.7</td></tr>
|
53 |
+
<tr><td>MathVision</td><td colspan="2">62.5 ✅</td><td>43.1</td><td>38.1</td><td>-</td><td>-</td><td>-</td><td>73.0</td><td>67.7</td><td>67.3</td></tr>
|
54 |
+
<tr><td>IFEval</td><td colspan="2">86.7</td><td>75.6</td><td>83.9</td><td>79.7</td><td>85.0</td><td>90.2</td><td>91.5</td><td>92.2</td><td>92.8</td></tr>
|
55 |
+
<tr><td>SFE</td><td colspan="2">44.3 👑</td><td>36.2</td><td>30.5</td><td>-</td><td>-</td><td>-</td><td>43.0</td><td>37.7</td><td>31.2</td></tr>
|
56 |
+
<tr><td>Physics</td><td colspan="2">44.0 ✅</td><td>23.1</td><td>15.7</td><td>-</td><td>-</td><td>-</td><td>40.0</td><td>47.9</td><td>42.8</td></tr>
|
57 |
+
<tr><td>SmolInstrcut</td><td colspan="2">51.0 👑</td><td>19.4</td><td>21.0</td><td>30.7</td><td>28.7</td><td>48.1</td><td>40.4</td><td>43.9</td><td>47.3</td></tr>
|
58 |
+
<tr><td>ChemBench</td><td colspan="2">83.4 👑</td><td>61.3</td><td>61.6</td><td>75.6</td><td>75.8</td><td>75.3</td><td>82.8</td><td>81.6</td><td>83.3</td></tr>
|
59 |
+
<tr><td>MatBench</td><td colspan="2">75.0 👑</td><td>49.3</td><td>51.5</td><td>57.7</td><td>52.1</td><td>61.7</td><td>61.7</td><td>61.6</td><td>67.9</td></tr>
|
60 |
+
<tr><td>MicroVQA</td><td colspan="2">63.9 👑</td><td>59.1</td><td>53.0</td><td>-</td><td>-</td><td>-</td><td>63.1</td><td>58.3</td><td>59.5</td></tr>
|
61 |
+
<tr><td>ProteinLMBench</td><td colspan="2">63.1</td><td>61.6</td><td>61.0</td><td>61.4</td><td>59.8</td><td>66.7</td><td>62.9</td><td>67.7</td><td>66.2</td></tr>
|
62 |
+
<tr><td>MSEarthMCQ</td><td colspan="2">65.7 👑</td><td>57.2</td><td>37.6</td><td>-</td><td>-</td><td>-</td><td>59.9</td><td>61.0</td><td>58.0</td></tr>
|
63 |
+
<tr><td>XLRS-Bench</td><td colspan="2">55.0 👑</td><td>49.3</td><td>50.9</td><td>-</td><td>-</td><td>-</td><td>45.2</td><td>43.6</td><td>45.4</td></tr>
|
64 |
+
</tbody>
|
65 |
+
</table>
|
66 |
+
|
67 |
+
> **Note**: ✅ means the best performance among open-sourced models, 👑 indicates the best performance among all models.
|
68 |
|
69 |
We use the [OpenCompass](https://github.com/open-compass/OpenCompass/) and [VLMEvalkit](https://github.com/open-compass/vlmevalkit) to evaluate all models.
|
70 |
|
|
|
114 |
print(decoded_output)
|
115 |
```
|
116 |
|
117 |
+
####Image input
|
118 |
|
119 |
```python
|
120 |
from transformers import AutoProcessor, AutoModelForCausalLM
|
|
|
196 |
|
197 |
#### [sglang](https://github.com/sgl-project/sglang)
|
198 |
|
199 |
+
Supporting Intern-S1 with SGLang is still in progress. Please refer to this [PR](https://github.com/sgl-project/sglang/pull/8350).
|
200 |
+
|
201 |
```bash
|
202 |
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
|
203 |
python3 -m sglang.launch_server \
|
204 |
--model-path internlm/Intern-S1 \
|
205 |
--trust-remote-code \
|
206 |
+
--mem-fraction-static 0.85 \
|
207 |
--tp 8 \
|
208 |
--enable-multimodal \
|
209 |
--grammar-backend none
|
|
|
215 |
# install ollama
|
216 |
curl -fsSL https://ollama.com/install.sh | sh
|
217 |
# fetch model
|
218 |
+
ollama pull internlm/interns1
|
219 |
# run model
|
220 |
+
ollama run internlm/interns1
|
221 |
# then use openai client to call on http://localhost:11434/v1
|
222 |
```
|
223 |
|
|
|
229 |
|
230 |
A key advantage for developers is that a growing number of open-source LLMs are designed to be compatible with the OpenAI API. This means you can leverage the same familiar syntax and structure from the OpenAI library to implement tool calling with these open-source models. As a result, the code demonstrated in this tutorial is versatile—it works not just with OpenAI models, but with any model that follows the same interface standard.
|
231 |
|
232 |
+
To illustrate how this works, let's dive into a practical code example that uses tool calling to get the latest weather forecast (based on lmdeploy api server).
|
233 |
|
234 |
```python
|
235 |
+
|
236 |
from openai import OpenAI
|
237 |
import json
|
238 |
|
|
|
357 |
temperature=0.8,
|
358 |
top_p=0.8,
|
359 |
stream=False,
|
360 |
+
extra_body=dict(spaces_between_special_tokens=False, enable_thinking=False),
|
361 |
tools=tools)
|
362 |
print(response.choices[0].message)
|
363 |
messages.append(response.choices[0].message)
|
|
|
379 |
temperature=0.8,
|
380 |
top_p=0.8,
|
381 |
stream=False,
|
382 |
+
extra_body=dict(spaces_between_special_tokens=False, enable_thinking=False),
|
383 |
tools=tools)
|
384 |
print(response.choices[0].message.content)
|
385 |
```
|
|
|
386 |
|
387 |
### Switching Between Thinking and Non-Thinking Modes
|
388 |
|
|
|
443 |
extra_body={
|
444 |
"chat_template_kwargs": {"enable_thinking": false}
|
445 |
}
|
446 |
+
```
|