Update README.md
Browse files
README.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
| 2 |
license: cc-by-nc-4.0
|
| 3 |
base_model: Qwen/Qwen2-7B-Instruct
|
| 4 |
model-index:
|
| 5 |
-
- name:
|
| 6 |
results: []
|
| 7 |
tags:
|
| 8 |
- RAG
|
|
@@ -14,7 +14,7 @@ spaces: false
|
|
| 14 |
language:
|
| 15 |
- en
|
| 16 |
---
|
| 17 |
-
#
|
| 18 |
|
| 19 |
<p align="center">
|
| 20 |
- <a href="https://www.nexaai.com/models" target="_blank">Nexa Model Hub</a>
|
|
@@ -26,7 +26,7 @@ language:
|
|
| 26 |
</p>
|
| 27 |
|
| 28 |
## Overview
|
| 29 |
-
|
| 30 |
- 🧠 Context as a distinct modality
|
| 31 |
- 🗜️ Language encoder for context compression
|
| 32 |
- 🔗 Multimodal techniques applied to language processing
|
|
@@ -34,7 +34,7 @@ Dolphin is a novel approach to accelerate language model inference by treating l
|
|
| 34 |
- 📜 Specialized for long context understanding
|
| 35 |
|
| 36 |
## Model Architecture
|
| 37 |
-
|
| 38 |
1. A smaller decoder (0.5B parameters) for transforming information from extensive contexts
|
| 39 |
2. A larger decoder (7B parameters) for comprehending and generating responses to current queries
|
| 40 |
3. The architecture also includes a projector to align embeddings between the text encoder and the main decoder.
|
|
@@ -131,7 +131,7 @@ If you use Dolphin in your research, please cite our paper:
|
|
| 131 |
|
| 132 |
```bibtex
|
| 133 |
@article{chen2024dolphinlongcontextnew,
|
| 134 |
-
title={
|
| 135 |
author={Wei Chen and Zhiyuan Li and Shuo Xin and Yihao Wang},
|
| 136 |
year={2024},
|
| 137 |
eprint={2408.15518},
|
|
|
|
| 2 |
license: cc-by-nc-4.0
|
| 3 |
base_model: Qwen/Qwen2-7B-Instruct
|
| 4 |
model-index:
|
| 5 |
+
- name: Squid
|
| 6 |
results: []
|
| 7 |
tags:
|
| 8 |
- RAG
|
|
|
|
| 14 |
language:
|
| 15 |
- en
|
| 16 |
---
|
| 17 |
+
# Squid: Long Context as a New Modality for on-device RAG
|
| 18 |
|
| 19 |
<p align="center">
|
| 20 |
- <a href="https://www.nexaai.com/models" target="_blank">Nexa Model Hub</a>
|
|
|
|
| 26 |
</p>
|
| 27 |
|
| 28 |
## Overview
|
| 29 |
+
Squid is a novel approach to accelerate language model inference by treating long context as a new modality, similar to image, audio, and video modalities in vision-language models. This innovative method incorporates a language encoder model to encode context information into embeddings, applying multimodal model concepts to enhance the efficiency of language model inference。 Below are model highlights:
|
| 30 |
- 🧠 Context as a distinct modality
|
| 31 |
- 🗜️ Language encoder for context compression
|
| 32 |
- 🔗 Multimodal techniques applied to language processing
|
|
|
|
| 34 |
- 📜 Specialized for long context understanding
|
| 35 |
|
| 36 |
## Model Architecture
|
| 37 |
+
Squid employs a decoder-decoder framework with two main components:
|
| 38 |
1. A smaller decoder (0.5B parameters) for transforming information from extensive contexts
|
| 39 |
2. A larger decoder (7B parameters) for comprehending and generating responses to current queries
|
| 40 |
3. The architecture also includes a projector to align embeddings between the text encoder and the main decoder.
|
|
|
|
| 131 |
|
| 132 |
```bibtex
|
| 133 |
@article{chen2024dolphinlongcontextnew,
|
| 134 |
+
title={Squid: Long Context as a New Modality for Energy-Efficient On-Device Language Models},
|
| 135 |
author={Wei Chen and Zhiyuan Li and Shuo Xin and Yihao Wang},
|
| 136 |
year={2024},
|
| 137 |
eprint={2408.15518},
|