Update README.md
Browse files
README.md
CHANGED
|
@@ -38,7 +38,7 @@ tags:
|
|
| 38 |
</a>
|
| 39 |
</div>
|
| 40 |
|
| 41 |
-
Doge uses Dynamic Mask Attention as sequence transformation and can use Multi-Layer Perceptron or Cross Domain Mixture of Experts as state transformation. Dynamic Mask Attention allows the Transformer to use self-attention during training and state space during inference, and Cross Domain Mixture of Experts can directly inherit the weights of Multi-Layer Perceptron for further training. This model is trained by [SmallDoge](https://huggingface.co/SmallDoge) community, for detailed algorithm and model architecture,
|
| 42 |
|
| 43 |
|
| 44 |
## Uses
|
|
|
|
| 38 |
</a>
|
| 39 |
</div>
|
| 40 |
|
| 41 |
+
Doge uses Dynamic Mask Attention as sequence transformation and can use Multi-Layer Perceptron or Cross Domain Mixture of Experts as state transformation. Dynamic Mask Attention allows the Transformer to use self-attention during training and state space during inference, and Cross Domain Mixture of Experts can directly inherit the weights of Multi-Layer Perceptron for further training. This model is trained by [SmallDoge](https://huggingface.co/SmallDoge) community, for detailed algorithm and model architecture, paper coming soon, all training details and code are available in the [small-doge](https://github.com/SmallDoges/small-doge) repository.
|
| 42 |
|
| 43 |
|
| 44 |
## Uses
|