task description for clustering
Hi! thanks for this new family of models, great work!
I've seen that, for typical retrieval task you recommend to prepend the task description to the query and leave the documents as is (without instruction), in order to compute the embeddings.
What about the clustering tasks? as in this case the problem is not unidirectional (given a query I want to find the relevant documents), but bidirectional (I want to compute the similarities as a cartesian product), should I need to add task description for all of the documents that I want to feed to the clustering algorithm, right?
So for example, a task description could be: "Given a document, find all the documents that are describe the same topic bla bla bla".
Furthermore, can you share all the task instructions that you have used for running the MTEB benchmark? Where I can find them, in your github repo?
Thanks in advance!