alvarobartt
/

Mistral-7B-v0.1-ORPO

Text Generation

text-generation-inference

Model card Files Files and versions

Metrics Training metrics Community

alvarobartt HF Staff commited on Mar 22, 2024

Commit

16f4d77

·

verified ·

1 Parent(s): 4856a37

Update README.md

Files changed (1) hide show

README.md +32 -1

README.md CHANGED Viewed

@@ -16,4 +16,35 @@ tags:
 ## ORPO fine-tune of Mistral 7B v0.1 with DPO Mix 7K
-...

 ## ORPO fine-tune of Mistral 7B v0.1 with DPO Mix 7K
+This is an ORPO fine-tune of [`mistralai/Mistral-7B-v0.1`](https://huggingface.co/mistralai/Mistral-7B-v0.1) with
+[`alvarobartt/dpo-mix-7k-simplified`](https://huggingface.co/datasets/alvarobartt/dpo-mix-7k-simplified).
+⚠️ Note that the code is still experimental, as the `ORPOTrainer` PR is still not merged, follow its progress
+at [🤗`trl` - `ORPOTrainer` PR](https://github.com/huggingface/trl/pull/1435).
+## About the fine-tuning
+In order to fine-tune [`mistralai/Mistral-7B-v0.1`](https://huggingface.co/mistralai/Mistral-7B-v0.1) using ORPO, the branch
+`orpo` from [🤗`trl`](https://github.com/huggingface/trl) has been used, thanks to the invaluable and quick contribution of
+@kashif.
+📅 Fine-tuning code will be shared soon!
+## About the dataset
+The dataset used for this fine-tune is [`alvarobartt/dpo-mix-7k-simplified`](https://huggingface.co/datasets/alvarobartt/dpo-mix-7k-simplified),
+which is a simplified version of [`argilla/dpo-mix-7k`](https://huggingface.co/datasets/argilla/dpo-mix-7k).
+The simplification comes from the fact that the `prompt` column is detached from both the `chosen` and `rejected`
+columns so that there's no need for extra pre-processing while applying the chat template to the dataset before the
+fine-tuning. So on, the dataset remains as is, with an additional column for the `prompt`.
+The dataset is a small cocktail combining Argilla's latest efforts on DPO datasets, mixing the following datasets:
+* [`argilla/distilabel-capybara-dpo-7k-binarized`](https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized)
+* [`argilla/distilabel-intel-orca-dpo-pairs`](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs)
+* [`argilla/ultrafeedback-binarized-preferences-cleaned`](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned)
+The samples have been randomly selected from the original datasets with a proportion of 0.33 each, as can be seen via the `dataset` column of the dataset.
+For more information about the original dataset check [the `README.md` file of `argilla/dpo-mix-7k`](https://huggingface.co/datasets/argilla/dpo-mix-7k/blob/main/README.md).