Which loss function was used to fine tune this model? Euclidean distance, cosine similarity?
We used cosine similarity, normalizing the output of the model (last token pooling)
· Sign up or log in to comment
Totally Free + Zero Barriers + No Login Required