Update README.md
Browse files
README.md
CHANGED
|
@@ -8,10 +8,9 @@ inference: false
|
|
| 8 |
|
| 9 |
# Monarch Mixer-BERT
|
| 10 |
|
| 11 |
-
|
| 12 |
-
This model has been pretrained with sequence length 2048, and it has been fine-tuned for long-context retrieval.
|
| 13 |
|
| 14 |
-
Check out our [blog post]() for more on how we trained this model for long sequence.
|
| 15 |
|
| 16 |
This model was trained by Jon Saad-Falcon, Dan Fu, and Simran Arora.
|
| 17 |
|
|
|
|
| 8 |
|
| 9 |
# Monarch Mixer-BERT
|
| 10 |
|
| 11 |
+
An 80M checkpoint of M2-BERT, pretrained with sequence length 2048, and it has been fine-tuned for long-context retrieval.
|
|
|
|
| 12 |
|
| 13 |
+
Check out the paper [Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture](https://arxiv.org/abs/2310.12109) and our [blog post]() on retrieval for more on how we trained this model for long sequence.
|
| 14 |
|
| 15 |
This model was trained by Jon Saad-Falcon, Dan Fu, and Simran Arora.
|
| 16 |
|