I think that the MTEB leaderboard is super complex. I feel lost looking at it (what metric should I judge by?)
Join the conversation
Join the community of Machine Learners and AI enthusiasts.
Sign Upashercn97
posted an update Mar 10, 2025
Post
3537
does anyone know what the SOTA in text embedding is? Specifically for like sentence similarity and clustering?
I think that the MTEB leaderboard is super complex. I feel lost looking at it (what metric should I judge by?)
I think that the MTEB leaderboard is super complex. I feel lost looking at it (what metric should I judge by?)
I would say, sort by "Mean (task)" and pick one of those. Or if you can, compare three of the best on your data. That holds unless you need a longer context, or you are in medical or similar field where there are domain-specific models
Oh wait this makes sense.
I have created some benchmarks from user data-- maybe i make my own leaderboard haha.
Thanks for the help!
Hey, as of now Gemini Embeddings is #1 https://developers.googleblog.com/en/gemini-embedding-text-model-now-available-gemini-api/
Yes ive seen! Thank you. My issue is the 100 requests a day..
Oh this is good 2 know!