internlm
/

Intern-S1-mini

@@ -24,7 +24,7 @@ pipeline_tag: image-text-to-text
 ## Introduction
 We introduce **Intern-S1-mini**, a lightweight open-source multimodal reasoning model based on the same techniques as **[Intern-S1](https://huggingface.co/internlm/Intern-S1)**.
-Built upon a 8B dense language model (Qwen3) and a 400M Vision encoder (InternViT), Intern-S1-mini has been further pretrained on **5 trillion tokens** of multimodal data, including over **2.5 trillion scientific-domain tokens**. This enables the model to retain strong general capabilities while excelling in specialized scientific domains such as **interpreting chemical structures, understanding protein sequences, and planning compound synthesis routes**, making Intern-S1-mini to be a capable research assistant for real-world scientific applications.
 ## Features
@@ -32,7 +32,7 @@ Built upon a 8B dense language model (Qwen3) and a 400M Vision encoder (InternVi
 - Continuously pretrained on a massive 5T token dataset, with over 50% specialized scientific data, embedding deep domain expertise.
-- Dynamic tokenizer enables native understanding of molecular formulas, protein sequences, and seismic signals.
 ## Performance
@@ -139,7 +139,7 @@ print(decoded_output)
 #### Video input
-Please ensure that the decord video decoding library is installed via `pip install decord`. To avoid OOM, please at least use 2 GPUS.
 ```python
 from transformers import AutoProcessor, AutoModelForCausalLM
@@ -385,7 +385,7 @@ print(response.choices[0].message.content)
 ### Switching Between Thinking and Non-Thinking Modes
-Intern-S1 enables thinking mode by default, enhancing the model's reasoning capabilities to generate higher-quality responses. This feature can be disabled by setting `enable_thinking=False` in `tokenizer.apply_chat_template`
 ```python
 text = tokenizer.apply_chat_template(
@@ -396,7 +396,7 @@ text = tokenizer.apply_chat_template(
 )
 ```
-With LMDeploy serving Intern-S1 models, you can dynamically control the thinking mode by adjusting the `enable_thinking` parameter in your requests.
 ```python
 from openai import OpenAI

 ## Introduction
 We introduce **Intern-S1-mini**, a lightweight open-source multimodal reasoning model based on the same techniques as **[Intern-S1](https://huggingface.co/internlm/Intern-S1)**.
+Built upon an 8B dense language model (Qwen3) and a 400M Vision encoder (InternViT), Intern-S1-mini has been further pretrained on **5 trillion tokens** of multimodal data, including over **2.5 trillion scientific-domain tokens**. This enables the model to retain strong general capabilities while excelling in specialized scientific domains such as **interpreting chemical structures, understanding protein sequences, and planning compound synthesis routes**, making Intern-S1-mini to be a capable research assistant for real-world scientific applications.
 ## Features
 - Continuously pretrained on a massive 5T token dataset, with over 50% specialized scientific data, embedding deep domain expertise.
+- Dynamic tokenizer enables native understanding of molecular formulas and protein sequences.
 ## Performance
 #### Video input
+Please ensure that the decord video decoding library is installed via `pip install decord`. To avoid OOM, please install flash_attention and use at least 2 GPUS.
 ```python
 from transformers import AutoProcessor, AutoModelForCausalLM
 ### Switching Between Thinking and Non-Thinking Modes
+Intern-S1-mini enables thinking mode by default, enhancing the model's reasoning capabilities to generate higher-quality responses. This feature can be disabled by setting `enable_thinking=False` in `tokenizer.apply_chat_template`
 ```python
 text = tokenizer.apply_chat_template(
 )
 ```
+With LMDeploy serving Intern-S1-mini models, you can dynamically control the thinking mode by adjusting the `enable_thinking` parameter in your requests.
 ```python
 from openai import OpenAI