internlm
/

Intern-S1

Image-Text-to-Text

Model card Files Files and versions Community

RangiLyu commited on 30 days ago

Commit

488dce9

·

verified ·

1 Parent(s): 365c1ce

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ pipeline_tag: image-text-to-text
 ## Introduction
 We introduce **Intern-S1**, our **most advanced open-source multimodal reasoning model** to date. Intern-S1 combines **strong general-task capabilities with state-of-the-art performance on a wide range of scientific tasks**, rivaling leading closed-source commercial models.
-Built upon a 235B MoE language model and a 6B Vision encoder, Intern-S1 has been further pretrained on **5 trillion tokens** of multimodal data, including over **2.5 trillion scientific-domain tokens**. This enables the model to retain strong general capabilities while excelling in specialized scientific domains such as **interpreting chemical structures, understanding protein sequences, and planning compound synthesis routes**, making Intern-S1 to be a capable research assistant for real-world scientific applications.
 Features
 - Strong performance across language and vision reasoning benchmarks, especially scientific tasks.

 ## Introduction
 We introduce **Intern-S1**, our **most advanced open-source multimodal reasoning model** to date. Intern-S1 combines **strong general-task capabilities with state-of-the-art performance on a wide range of scientific tasks**, rivaling leading closed-source commercial models.
+Built upon a 235B MoE language model (Qwen3) and a 6B Vision encoder (InternViT), Intern-S1 has been further pretrained on **5 trillion tokens** of multimodal data, including over **2.5 trillion scientific-domain tokens**. This enables the model to retain strong general capabilities while excelling in specialized scientific domains such as **interpreting chemical structures, understanding protein sequences, and planning compound synthesis routes**, making Intern-S1 to be a capable research assistant for real-world scientific applications.
 Features
 - Strong performance across language and vision reasoning benchmarks, especially scientific tasks.