OpenGVLab
/

InternVL2-2B-AWQ

Image-Text-to-Text

feature-extraction

Model card Files Files and versions

czczup commited on Jul 15, 2024

Commit

b252962

·

verified ·

1 Parent(s): bba800e

Upload folder using huggingface_hub

Files changed (1) hide show

README.md +11 -3

README.md CHANGED Viewed

@@ -3,11 +3,19 @@ license: mit
 pipeline_tag: image-text-to-text
 ---
 <div align="center">
   <img src="https://raw.githubusercontent.com/InternLM/lmdeploy/0be9e7ab6fe9a066cfb0a09d0e0c8d2e28435e58/resources/lmdeploy-logo.svg" width="450"/>
 </div>
-# INT4 Weight-only Quantization and Deployment (W4A16)
 LMDeploy adopts [AWQ](https://arxiv.org/abs/2306.00978) algorithm for 4bit weight-only quantization. By developed the high-performance cuda kernel, the 4bit quantized model inference achieves up to 2.4x faster than FP16.
@@ -34,7 +42,7 @@ This article comprises the following sections:
 <!-- tocstop -->
-## Inference
 Trying the following codes, you can perform the batched offline inference with the quantized model:
@@ -56,7 +64,7 @@ print(response.text)
 For more information about the pipeline parameters, please refer to [here](https://github.com/InternLM/lmdeploy/blob/main/docs/en/inference/pipeline.md).
-## Service
 To deploy InternVL2 as an API, please configure the chat template config first. Create the following JSON file `chat_template.json`.

 pipeline_tag: image-text-to-text
 ---
+# InternVL2-2B-AWQ
+[\[📂 GitHub\]](https://github.com/OpenGVLab/InternVL)  [\[🆕 Blog\]](https://internvl.github.io/blog/)  [\[📜 InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238)  [\[📜 InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821)
+[\[🗨️ Chat Demo\]](https://internvl.opengvlab.com/)  [\[🤗 HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL)  [\[🚀 Quick Start\]](#quick-start)  [\[📖 中文解读\]](https://zhuanlan.zhihu.com/p/706547971)  \[🌟 [魔搭社区](https://modelscope.cn/organization/OpenGVLab) | [教程](https://mp.weixin.qq.com/s/OUaVLkxlk1zhFb1cvMCFjg) \]
+## Introduction
 <div align="center">
   <img src="https://raw.githubusercontent.com/InternLM/lmdeploy/0be9e7ab6fe9a066cfb0a09d0e0c8d2e28435e58/resources/lmdeploy-logo.svg" width="450"/>
 </div>
+### INT4 Weight-only Quantization and Deployment (W4A16)
 LMDeploy adopts [AWQ](https://arxiv.org/abs/2306.00978) algorithm for 4bit weight-only quantization. By developed the high-performance cuda kernel, the 4bit quantized model inference achieves up to 2.4x faster than FP16.
 <!-- tocstop -->
+### Inference
 Trying the following codes, you can perform the batched offline inference with the quantized model:
 For more information about the pipeline parameters, please refer to [here](https://github.com/InternLM/lmdeploy/blob/main/docs/en/inference/pipeline.md).
+### Service
 To deploy InternVL2 as an API, please configure the chat template config first. Create the following JSON file `chat_template.json`.