sentence-transformers-from-synthetic-data
Example of using distilabel to generate synthetic triplets data for fine-tuning a Sentence Transformer model
Viewer • Updated • 50.7k • 265 • 104Note Input dataset for generating synthetic data. We use the `instruction` column as a starting point.
davanstrien/similarity-dataset-sc2-8b
Viewer • Updated • 2.32k • 90 • 6Note The dataset was generated from our pipeline. The `instruction` column from the input dataset becomes the anchor, alongside a generated positive and negative pair. This results in a triplets dataset we can use to train a Sentence Transformers model. You can find the code used here: https://github.com/davanstrien/awesome-synthetic-datasets
davanstrien/code-prompt-similarity-model
Sentence Similarity • 0.1B • Updated • 4 • 6Note A fine-tuned Sentence Transformers model using the above dataset. You can see we get a nice bump in performance from minimal fine-tuning.
-
davanstrien/abstract-wiki
Viewer • Updated • 5k • 44 • 2