togethercomputer
/

m2-bert-80M-2k-retrieval

Sentence Similarity

text-classification

Model card Files Files and versions

danfu09 commited on Jan 9, 2024

Commit

3e426fb

·

1 Parent(s): 3fdab0e

Update README.md

Files changed (1) hide show

README.md +2 -3

README.md CHANGED Viewed

@@ -8,10 +8,9 @@ inference: false
 # Monarch Mixer-BERT
-The 80M checkpoint for M2-BERT-base from the paper [Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture](https://arxiv.org/abs/2310.12109).
-This model has been pretrained with sequence length 2048, and it has been fine-tuned for long-context retrieval.
-Check out our [blog post]() for more on how we trained this model for long sequence.
 This model was trained by Jon Saad-Falcon, Dan Fu, and Simran Arora.

 # Monarch Mixer-BERT
+An 80M checkpoint of M2-BERT, pretrained with sequence length 2048, and it has been fine-tuned for long-context retrieval.
+Check out the paper [Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture](https://arxiv.org/abs/2310.12109) and our [blog post]() on retrieval for more on how we trained this model for long sequence.
 This model was trained by Jon Saad-Falcon, Dan Fu, and Simran Arora.