Update README.md
Browse files
README.md
CHANGED
|
@@ -156,7 +156,7 @@ We evaluate our model on LiveCodeBench (0901-0401), a benchmark designed for liv
|
|
| 156 |
|
| 157 |
## 4. Model Architecture
|
| 158 |
DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference:
|
| 159 |
-
- For attention, we design
|
| 160 |
- For Feed-Forward Networks (FFNs), we adopt DeepSeekMoE architecture, a high-performance MoE architecture that enables training stronger models at lower costs.
|
| 161 |
|
| 162 |
<p align="center">
|
|
|
|
| 156 |
|
| 157 |
## 4. Model Architecture
|
| 158 |
DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference:
|
| 159 |
+
- For attention, we design MLA (Multi-head Latent Attention), which utilizes low-rank key-value union compression to eliminate the bottleneck of inference-time key-value cache, thus supporting efficient inference.
|
| 160 |
- For Feed-Forward Networks (FFNs), we adopt DeepSeekMoE architecture, a high-performance MoE architecture that enables training stronger models at lower costs.
|
| 161 |
|
| 162 |
<p align="center">
|