RangiLyu commited on
Commit
2a79595
·
verified ·
1 Parent(s): ff0c7fd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -13
README.md CHANGED
@@ -1,13 +1,17 @@
1
- ---
2
- license: apache-2.0
3
- pipeline_tag: image-text-to-text
4
- ---
5
 
6
 
7
  ## Intern-S1
8
 
9
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/642695e5274e7ad464c8a5ba/E43cgEXBRWjVJlU_-hdh6.png)
10
 
 
 
 
 
11
  ## Introduction
12
 
13
  We introduce **Intern-S1**, our **most advanced open-source multimodal reasoning model** to date. Intern-S1 combines **strong general-task capabilities with state-of-the-art performance on a wide range of scientific tasks**, rivaling leading closed-source commercial models.
@@ -24,7 +28,43 @@ Features
24
 
25
  We evaluate the Intern-S1 on various benchmarks including general datasets and scientifc datasets. We report the performance comparsion with the recent VLMs and LLMs below.
26
 
27
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  We use the [OpenCompass](https://github.com/open-compass/OpenCompass/) and [VLMEvalkit](https://github.com/open-compass/vlmevalkit) to evaluate all models.
30
 
@@ -74,7 +114,7 @@ decoded_output = processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :
74
  print(decoded_output)
75
  ```
76
 
77
- #### Image input
78
 
79
  ```python
80
  from transformers import AutoProcessor, AutoModelForCausalLM
@@ -156,11 +196,14 @@ Coming soon.
156
 
157
  #### [sglang](https://github.com/sgl-project/sglang)
158
 
 
 
159
  ```bash
160
  CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
161
  python3 -m sglang.launch_server \
162
  --model-path internlm/Intern-S1 \
163
  --trust-remote-code \
 
164
  --tp 8 \
165
  --enable-multimodal \
166
  --grammar-backend none
@@ -172,9 +215,9 @@ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
172
  # install ollama
173
  curl -fsSL https://ollama.com/install.sh | sh
174
  # fetch model
175
- ollama pull internlm/Intern-S1
176
  # run model
177
- ollama run internlm/Intern-S1
178
  # then use openai client to call on http://localhost:11434/v1
179
  ```
180
 
@@ -186,9 +229,10 @@ Many Large Language Models (LLMs) now feature **Tool Calling**, a powerful capab
186
 
187
  A key advantage for developers is that a growing number of open-source LLMs are designed to be compatible with the OpenAI API. This means you can leverage the same familiar syntax and structure from the OpenAI library to implement tool calling with these open-source models. As a result, the code demonstrated in this tutorial is versatile—it works not just with OpenAI models, but with any model that follows the same interface standard.
188
 
189
- To illustrate how this works, let's dive into a practical code example that uses tool calling to get the latest weather forecast.
190
 
191
  ```python
 
192
  from openai import OpenAI
193
  import json
194
 
@@ -313,7 +357,7 @@ response = client.chat.completions.create(
313
  temperature=0.8,
314
  top_p=0.8,
315
  stream=False,
316
- extra_body=dict(spaces_between_special_tokens=False),
317
  tools=tools)
318
  print(response.choices[0].message)
319
  messages.append(response.choices[0].message)
@@ -335,11 +379,10 @@ response = client.chat.completions.create(
335
  temperature=0.8,
336
  top_p=0.8,
337
  stream=False,
338
- extra_body=dict(spaces_between_special_tokens=False),
339
  tools=tools)
340
  print(response.choices[0].message.content)
341
  ```
342
-
343
 
344
  ### Switching Between Thinking and Non-Thinking Modes
345
 
@@ -400,4 +443,4 @@ For vllm and sglang users, configure this through,
400
  extra_body={
401
  "chat_template_kwargs": {"enable_thinking": false}
402
  }
403
- ```
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: image-text-to-text
4
+ ---
5
 
6
 
7
  ## Intern-S1
8
 
9
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/642695e5274e7ad464c8a5ba/E43cgEXBRWjVJlU_-hdh6.png)
10
 
11
+
12
+ [![GitHub](https://img.shields.io/badge/GitHub-InternS1-blue)](https://github.com/InternLM/Intern-S1)
13
+
14
+
15
  ## Introduction
16
 
17
  We introduce **Intern-S1**, our **most advanced open-source multimodal reasoning model** to date. Intern-S1 combines **strong general-task capabilities with state-of-the-art performance on a wide range of scientific tasks**, rivaling leading closed-source commercial models.
 
28
 
29
  We evaluate the Intern-S1 on various benchmarks including general datasets and scientifc datasets. We report the performance comparsion with the recent VLMs and LLMs below.
30
 
31
+ <table>
32
+ <thead>
33
+ <tr>
34
+ <th rowspan="2">Benchmarks</th>
35
+ <th colspan="2">Intern-S1</th>
36
+ <th>InternVL3-78B</th>
37
+ <th>Qwen2.5-VL-72B</th>
38
+ <th>DS-R1-0528</th>
39
+ <th>Qwen3-235B-A2.2B</th>
40
+ <th>Kimi-K2-Instruct</th>
41
+ <th>Gemini-2.5 Pro</th>
42
+ <th>o3</th>
43
+ <th>Grok-4</th>
44
+ </tr>
45
+ </thead>
46
+ <tbody>
47
+ <tr><td>MMUL-Pro</td><td colspan="2">83.5 ✅</td><td>73.0</td><td>72.1</td><td>83.4</td><td>82.2</td><td>82.7</td><td>86.0</td><td>85.0</td><td>85.9</td></tr>
48
+ <tr><td>MMMU</td><td colspan="2">77.7 ✅</td><td>72.2</td><td>70.2</td><td>-</td><td>-</td><td>-</td><td>81.9</td><td>80.8</td><td>77.9</td></tr>
49
+ <tr><td>GPQA</td><td colspan="2">77.3</td><td>49.9</td><td>49.0</td><td>80.6</td><td>71.1</td><td>77.8</td><td>83.8</td><td>83.3</td><td>87.5</td></tr>
50
+ <tr><td>MMStar</td><td colspan="2">74.9 ✅</td><td>72.5</td><td>70.8</td><td>-</td><td>-</td><td>-</td><td>79.3</td><td>75.1</td><td>69.6</td></tr>
51
+ <tr><td>MathVista</td><td colspan="2">81.5 👑</td><td>79.0</td><td>74.8</td><td>-</td><td>-</td><td>-</td><td>80.3</td><td>77.5</td><td>72.5</td></tr>
52
+ <tr><td>AIME2025</td><td colspan="2">86.0</td><td>10.7</td><td>10.9</td><td>87.5</td><td>81.5</td><td>51.4</td><td>83.0</td><td>88.9</td><td>91.7</td></tr>
53
+ <tr><td>MathVision</td><td colspan="2">62.5 ✅</td><td>43.1</td><td>38.1</td><td>-</td><td>-</td><td>-</td><td>73.0</td><td>67.7</td><td>67.3</td></tr>
54
+ <tr><td>IFEval</td><td colspan="2">86.7</td><td>75.6</td><td>83.9</td><td>79.7</td><td>85.0</td><td>90.2</td><td>91.5</td><td>92.2</td><td>92.8</td></tr>
55
+ <tr><td>SFE</td><td colspan="2">44.3 👑</td><td>36.2</td><td>30.5</td><td>-</td><td>-</td><td>-</td><td>43.0</td><td>37.7</td><td>31.2</td></tr>
56
+ <tr><td>Physics</td><td colspan="2">44.0 ✅</td><td>23.1</td><td>15.7</td><td>-</td><td>-</td><td>-</td><td>40.0</td><td>47.9</td><td>42.8</td></tr>
57
+ <tr><td>SmolInstrcut</td><td colspan="2">51.0 👑</td><td>19.4</td><td>21.0</td><td>30.7</td><td>28.7</td><td>48.1</td><td>40.4</td><td>43.9</td><td>47.3</td></tr>
58
+ <tr><td>ChemBench</td><td colspan="2">83.4 👑</td><td>61.3</td><td>61.6</td><td>75.6</td><td>75.8</td><td>75.3</td><td>82.8</td><td>81.6</td><td>83.3</td></tr>
59
+ <tr><td>MatBench</td><td colspan="2">75.0 👑</td><td>49.3</td><td>51.5</td><td>57.7</td><td>52.1</td><td>61.7</td><td>61.7</td><td>61.6</td><td>67.9</td></tr>
60
+ <tr><td>MicroVQA</td><td colspan="2">63.9 👑</td><td>59.1</td><td>53.0</td><td>-</td><td>-</td><td>-</td><td>63.1</td><td>58.3</td><td>59.5</td></tr>
61
+ <tr><td>ProteinLMBench</td><td colspan="2">63.1</td><td>61.6</td><td>61.0</td><td>61.4</td><td>59.8</td><td>66.7</td><td>62.9</td><td>67.7</td><td>66.2</td></tr>
62
+ <tr><td>MSEarthMCQ</td><td colspan="2">65.7 👑</td><td>57.2</td><td>37.6</td><td>-</td><td>-</td><td>-</td><td>59.9</td><td>61.0</td><td>58.0</td></tr>
63
+ <tr><td>XLRS-Bench</td><td colspan="2">55.0 👑</td><td>49.3</td><td>50.9</td><td>-</td><td>-</td><td>-</td><td>45.2</td><td>43.6</td><td>45.4</td></tr>
64
+ </tbody>
65
+ </table>
66
+
67
+ > **Note**: ✅ means the best performance among open-sourced models, 👑 indicates the best performance among all models.
68
 
69
  We use the [OpenCompass](https://github.com/open-compass/OpenCompass/) and [VLMEvalkit](https://github.com/open-compass/vlmevalkit) to evaluate all models.
70
 
 
114
  print(decoded_output)
115
  ```
116
 
117
+ ####Image input
118
 
119
  ```python
120
  from transformers import AutoProcessor, AutoModelForCausalLM
 
196
 
197
  #### [sglang](https://github.com/sgl-project/sglang)
198
 
199
+ Supporting Intern-S1 with SGLang is still in progress. Please refer to this [PR](https://github.com/sgl-project/sglang/pull/8350).
200
+
201
  ```bash
202
  CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
203
  python3 -m sglang.launch_server \
204
  --model-path internlm/Intern-S1 \
205
  --trust-remote-code \
206
+ --mem-fraction-static 0.85 \
207
  --tp 8 \
208
  --enable-multimodal \
209
  --grammar-backend none
 
215
  # install ollama
216
  curl -fsSL https://ollama.com/install.sh | sh
217
  # fetch model
218
+ ollama pull internlm/interns1
219
  # run model
220
+ ollama run internlm/interns1
221
  # then use openai client to call on http://localhost:11434/v1
222
  ```
223
 
 
229
 
230
  A key advantage for developers is that a growing number of open-source LLMs are designed to be compatible with the OpenAI API. This means you can leverage the same familiar syntax and structure from the OpenAI library to implement tool calling with these open-source models. As a result, the code demonstrated in this tutorial is versatile—it works not just with OpenAI models, but with any model that follows the same interface standard.
231
 
232
+ To illustrate how this works, let's dive into a practical code example that uses tool calling to get the latest weather forecast (based on lmdeploy api server).
233
 
234
  ```python
235
+
236
  from openai import OpenAI
237
  import json
238
 
 
357
  temperature=0.8,
358
  top_p=0.8,
359
  stream=False,
360
+ extra_body=dict(spaces_between_special_tokens=False, enable_thinking=False),
361
  tools=tools)
362
  print(response.choices[0].message)
363
  messages.append(response.choices[0].message)
 
379
  temperature=0.8,
380
  top_p=0.8,
381
  stream=False,
382
+ extra_body=dict(spaces_between_special_tokens=False, enable_thinking=False),
383
  tools=tools)
384
  print(response.choices[0].message.content)
385
  ```
 
386
 
387
  ### Switching Between Thinking and Non-Thinking Modes
388
 
 
443
  extra_body={
444
  "chat_template_kwargs": {"enable_thinking": false}
445
  }
446
+ ```