internlm
/

Intern-S1

Image-Text-to-Text

Safetensors

interns1

conversational

custom_code

Model card Files Files and versions Community

RangiLyu commited on about 1 month ago

Commit

2a79595

verified ·

1 Parent(s): ff0c7fd

Update README.md

Browse files

Files changed (1) hide show

README.md +56 -13

README.md CHANGED Viewed

@@ -1,13 +1,17 @@
----
-license: apache-2.0
-pipeline_tag: image-text-to-text
----
 ## Intern-S1
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/642695e5274e7ad464c8a5ba/E43cgEXBRWjVJlU_-hdh6.png)
 ## Introduction
 We introduce **Intern-S1**, our **most advanced open-source multimodal reasoning model** to date. Intern-S1 combines **strong general-task capabilities with state-of-the-art performance on a wide range of scientific tasks**, rivaling leading closed-source commercial models.
@@ -24,7 +28,43 @@ Features
 We evaluate the Intern-S1 on various benchmarks including general datasets and scientifc datasets. We report the performance comparsion with the recent VLMs and LLMs below.
 We use the [OpenCompass](https://github.com/open-compass/OpenCompass/) and [VLMEvalkit](https://github.com/open-compass/vlmevalkit) to evaluate all models.
@@ -74,7 +114,7 @@ decoded_output = processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :
 print(decoded_output)
 ```
-#### Image input
 ```python
 from transformers import AutoProcessor, AutoModelForCausalLM
@@ -156,11 +196,14 @@ Coming soon.
 #### [sglang](https://github.com/sgl-project/sglang)
 ```bash
 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
     python3 -m sglang.launch_server \
     --model-path internlm/Intern-S1 \
     --trust-remote-code \
     --tp 8 \
     --enable-multimodal \
     --grammar-backend none
@@ -172,9 +215,9 @@ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
 # install ollama
 curl -fsSL https://ollama.com/install.sh | sh
 # fetch model
-ollama pull internlm/Intern-S1
 # run model
-ollama run internlm/Intern-S1
 # then use openai client to call on http://localhost:11434/v1
 ```
@@ -186,9 +229,10 @@ Many Large Language Models (LLMs) now feature **Tool Calling**, a powerful capab
 A key advantage for developers is that a growing number of open-source LLMs are designed to be compatible with the OpenAI API. This means you can leverage the same familiar syntax and structure from the OpenAI library to implement tool calling with these open-source models. As a result, the code demonstrated in this tutorial is versatile—it works not just with OpenAI models, but with any model that follows the same interface standard.
-To illustrate how this works, let's dive into a practical code example that uses tool calling to get the latest weather forecast.
 ```python
 from openai import OpenAI
 import json
@@ -313,7 +357,7 @@ response = client.chat.completions.create(
     temperature=0.8,
     top_p=0.8,
     stream=False,
-    extra_body=dict(spaces_between_special_tokens=False),
     tools=tools)
 print(response.choices[0].message)
 messages.append(response.choices[0].message)
@@ -335,11 +379,10 @@ response = client.chat.completions.create(
     temperature=0.8,
     top_p=0.8,
     stream=False,
-    extra_body=dict(spaces_between_special_tokens=False),
     tools=tools)
 print(response.choices[0].message.content)
 ```
 ### Switching Between Thinking and Non-Thinking Modes
@@ -400,4 +443,4 @@ For vllm and sglang users, configure this through,
 extra_body={
     "chat_template_kwargs": {"enable_thinking": false}
 }
-```

+---
+license: apache-2.0
+pipeline_tag: image-text-to-text
+---
 ## Intern-S1
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/642695e5274e7ad464c8a5ba/E43cgEXBRWjVJlU_-hdh6.png)
+[![GitHub](https://img.shields.io/badge/GitHub-InternS1-blue)](https://github.com/InternLM/Intern-S1)
 ## Introduction
 We introduce **Intern-S1**, our **most advanced open-source multimodal reasoning model** to date. Intern-S1 combines **strong general-task capabilities with state-of-the-art performance on a wide range of scientific tasks**, rivaling leading closed-source commercial models.
 We evaluate the Intern-S1 on various benchmarks including general datasets and scientifc datasets. We report the performance comparsion with the recent VLMs and LLMs below.
+<table>
+  <thead>
+    <tr>
+      <th rowspan="2">Benchmarks</th>
+      <th colspan="2">Intern-S1</th>
+      <th>InternVL3-78B</th>
+      <th>Qwen2.5-VL-72B</th>
+      <th>DS-R1-0528</th>
+      <th>Qwen3-235B-A2.2B</th>
+      <th>Kimi-K2-Instruct</th>
+      <th>Gemini-2.5 Pro</th>
+      <th>o3</th>
+      <th>Grok-4</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr><td>MMUL-Pro</td><td colspan="2">83.5 ✅</td><td>73.0</td><td>72.1</td><td>83.4</td><td>82.2</td><td>82.7</td><td>86.0</td><td>85.0</td><td>85.9</td></tr>
+    <tr><td>MMMU</td><td colspan="2">77.7 ✅</td><td>72.2</td><td>70.2</td><td>-</td><td>-</td><td>-</td><td>81.9</td><td>80.8</td><td>77.9</td></tr>
+    <tr><td>GPQA</td><td colspan="2">77.3</td><td>49.9</td><td>49.0</td><td>80.6</td><td>71.1</td><td>77.8</td><td>83.8</td><td>83.3</td><td>87.5</td></tr>
+    <tr><td>MMStar</td><td colspan="2">74.9 ✅</td><td>72.5</td><td>70.8</td><td>-</td><td>-</td><td>-</td><td>79.3</td><td>75.1</td><td>69.6</td></tr>
+    <tr><td>MathVista</td><td colspan="2">81.5 👑</td><td>79.0</td><td>74.8</td><td>-</td><td>-</td><td>-</td><td>80.3</td><td>77.5</td><td>72.5</td></tr>
+    <tr><td>AIME2025</td><td colspan="2">86.0</td><td>10.7</td><td>10.9</td><td>87.5</td><td>81.5</td><td>51.4</td><td>83.0</td><td>88.9</td><td>91.7</td></tr>
+    <tr><td>MathVision</td><td colspan="2">62.5 ✅</td><td>43.1</td><td>38.1</td><td>-</td><td>-</td><td>-</td><td>73.0</td><td>67.7</td><td>67.3</td></tr>
+    <tr><td>IFEval</td><td colspan="2">86.7</td><td>75.6</td><td>83.9</td><td>79.7</td><td>85.0</td><td>90.2</td><td>91.5</td><td>92.2</td><td>92.8</td></tr>
+    <tr><td>SFE</td><td colspan="2">44.3 👑</td><td>36.2</td><td>30.5</td><td>-</td><td>-</td><td>-</td><td>43.0</td><td>37.7</td><td>31.2</td></tr>
+    <tr><td>Physics</td><td colspan="2">44.0 ✅</td><td>23.1</td><td>15.7</td><td>-</td><td>-</td><td>-</td><td>40.0</td><td>47.9</td><td>42.8</td></tr>
+    <tr><td>SmolInstrcut</td><td colspan="2">51.0 👑</td><td>19.4</td><td>21.0</td><td>30.7</td><td>28.7</td><td>48.1</td><td>40.4</td><td>43.9</td><td>47.3</td></tr>
+    <tr><td>ChemBench</td><td colspan="2">83.4 👑</td><td>61.3</td><td>61.6</td><td>75.6</td><td>75.8</td><td>75.3</td><td>82.8</td><td>81.6</td><td>83.3</td></tr>
+    <tr><td>MatBench</td><td colspan="2">75.0 👑</td><td>49.3</td><td>51.5</td><td>57.7</td><td>52.1</td><td>61.7</td><td>61.7</td><td>61.6</td><td>67.9</td></tr>
+    <tr><td>MicroVQA</td><td colspan="2">63.9 👑</td><td>59.1</td><td>53.0</td><td>-</td><td>-</td><td>-</td><td>63.1</td><td>58.3</td><td>59.5</td></tr>
+    <tr><td>ProteinLMBench</td><td colspan="2">63.1</td><td>61.6</td><td>61.0</td><td>61.4</td><td>59.8</td><td>66.7</td><td>62.9</td><td>67.7</td><td>66.2</td></tr>
+    <tr><td>MSEarthMCQ</td><td colspan="2">65.7 👑</td><td>57.2</td><td>37.6</td><td>-</td><td>-</td><td>-</td><td>59.9</td><td>61.0</td><td>58.0</td></tr>
+    <tr><td>XLRS-Bench</td><td colspan="2">55.0 👑</td><td>49.3</td><td>50.9</td><td>-</td><td>-</td><td>-</td><td>45.2</td><td>43.6</td><td>45.4</td></tr>
+  </tbody>
+</table>
+> **Note**: ✅ means the best performance among open-sourced models, 👑 indicates the best performance among all models.
 We use the [OpenCompass](https://github.com/open-compass/OpenCompass/) and [VLMEvalkit](https://github.com/open-compass/vlmevalkit) to evaluate all models.
 print(decoded_output)
 ```
+####Image input
 ```python
 from transformers import AutoProcessor, AutoModelForCausalLM
 #### [sglang](https://github.com/sgl-project/sglang)
+Supporting Intern-S1 with SGLang is still in progress. Please refer to this [PR](https://github.com/sgl-project/sglang/pull/8350).
 ```bash
 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
     python3 -m sglang.launch_server \
     --model-path internlm/Intern-S1 \
     --trust-remote-code \
+    --mem-fraction-static 0.85 \
     --tp 8 \
     --enable-multimodal \
     --grammar-backend none
 # install ollama
 curl -fsSL https://ollama.com/install.sh | sh
 # fetch model
+ollama pull internlm/interns1
 # run model
+ollama run internlm/interns1
 # then use openai client to call on http://localhost:11434/v1
 ```
 A key advantage for developers is that a growing number of open-source LLMs are designed to be compatible with the OpenAI API. This means you can leverage the same familiar syntax and structure from the OpenAI library to implement tool calling with these open-source models. As a result, the code demonstrated in this tutorial is versatile—it works not just with OpenAI models, but with any model that follows the same interface standard.
+To illustrate how this works, let's dive into a practical code example that uses tool calling to get the latest weather forecast (based on lmdeploy api server).
 ```python
 from openai import OpenAI
 import json
     temperature=0.8,
     top_p=0.8,
     stream=False,
+    extra_body=dict(spaces_between_special_tokens=False, enable_thinking=False),
     tools=tools)
 print(response.choices[0].message)
 messages.append(response.choices[0].message)
     temperature=0.8,
     top_p=0.8,
     stream=False,
+    extra_body=dict(spaces_between_special_tokens=False, enable_thinking=False),
     tools=tools)
 print(response.choices[0].message.content)
 ```
 ### Switching Between Thinking and Non-Thinking Modes
 extra_body={
     "chat_template_kwargs": {"enable_thinking": false}
 }
+```