Update README.md
Browse files
README.md
CHANGED
@@ -16,4 +16,35 @@ tags:
|
|
16 |
|
17 |
## ORPO fine-tune of Mistral 7B v0.1 with DPO Mix 7K
|
18 |
|
19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
## ORPO fine-tune of Mistral 7B v0.1 with DPO Mix 7K
|
18 |
|
19 |
+
This is an ORPO fine-tune of [`mistralai/Mistral-7B-v0.1`](https://huggingface.co/mistralai/Mistral-7B-v0.1) with
|
20 |
+
[`alvarobartt/dpo-mix-7k-simplified`](https://huggingface.co/datasets/alvarobartt/dpo-mix-7k-simplified).
|
21 |
+
|
22 |
+
⚠️ Note that the code is still experimental, as the `ORPOTrainer` PR is still not merged, follow its progress
|
23 |
+
at [🤗`trl` - `ORPOTrainer` PR](https://github.com/huggingface/trl/pull/1435).
|
24 |
+
|
25 |
+
## About the fine-tuning
|
26 |
+
|
27 |
+
In order to fine-tune [`mistralai/Mistral-7B-v0.1`](https://huggingface.co/mistralai/Mistral-7B-v0.1) using ORPO, the branch
|
28 |
+
`orpo` from [🤗`trl`](https://github.com/huggingface/trl) has been used, thanks to the invaluable and quick contribution of
|
29 |
+
@kashif.
|
30 |
+
|
31 |
+
📅 Fine-tuning code will be shared soon!
|
32 |
+
|
33 |
+
## About the dataset
|
34 |
+
|
35 |
+
The dataset used for this fine-tune is [`alvarobartt/dpo-mix-7k-simplified`](https://huggingface.co/datasets/alvarobartt/dpo-mix-7k-simplified),
|
36 |
+
which is a simplified version of [`argilla/dpo-mix-7k`](https://huggingface.co/datasets/argilla/dpo-mix-7k).
|
37 |
+
|
38 |
+
The simplification comes from the fact that the `prompt` column is detached from both the `chosen` and `rejected`
|
39 |
+
columns so that there's no need for extra pre-processing while applying the chat template to the dataset before the
|
40 |
+
fine-tuning. So on, the dataset remains as is, with an additional column for the `prompt`.
|
41 |
+
|
42 |
+
The dataset is a small cocktail combining Argilla's latest efforts on DPO datasets, mixing the following datasets:
|
43 |
+
|
44 |
+
* [`argilla/distilabel-capybara-dpo-7k-binarized`](https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized)
|
45 |
+
* [`argilla/distilabel-intel-orca-dpo-pairs`](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs)
|
46 |
+
* [`argilla/ultrafeedback-binarized-preferences-cleaned`](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned)
|
47 |
+
|
48 |
+
The samples have been randomly selected from the original datasets with a proportion of 0.33 each, as can be seen via the `dataset` column of the dataset.
|
49 |
+
|
50 |
+
For more information about the original dataset check [the `README.md` file of `argilla/dpo-mix-7k`](https://huggingface.co/datasets/argilla/dpo-mix-7k/blob/main/README.md).
|