alvarobartt HF Staff commited on
Commit
16f4d77
·
verified ·
1 Parent(s): 4856a37

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -1
README.md CHANGED
@@ -16,4 +16,35 @@ tags:
16
 
17
  ## ORPO fine-tune of Mistral 7B v0.1 with DPO Mix 7K
18
 
19
- ...
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
  ## ORPO fine-tune of Mistral 7B v0.1 with DPO Mix 7K
18
 
19
+ This is an ORPO fine-tune of [`mistralai/Mistral-7B-v0.1`](https://huggingface.co/mistralai/Mistral-7B-v0.1) with
20
+ [`alvarobartt/dpo-mix-7k-simplified`](https://huggingface.co/datasets/alvarobartt/dpo-mix-7k-simplified).
21
+
22
+ ⚠️ Note that the code is still experimental, as the `ORPOTrainer` PR is still not merged, follow its progress
23
+ at [🤗`trl` - `ORPOTrainer` PR](https://github.com/huggingface/trl/pull/1435).
24
+
25
+ ## About the fine-tuning
26
+
27
+ In order to fine-tune [`mistralai/Mistral-7B-v0.1`](https://huggingface.co/mistralai/Mistral-7B-v0.1) using ORPO, the branch
28
+ `orpo` from [🤗`trl`](https://github.com/huggingface/trl) has been used, thanks to the invaluable and quick contribution of
29
+ @kashif.
30
+
31
+ 📅 Fine-tuning code will be shared soon!
32
+
33
+ ## About the dataset
34
+
35
+ The dataset used for this fine-tune is [`alvarobartt/dpo-mix-7k-simplified`](https://huggingface.co/datasets/alvarobartt/dpo-mix-7k-simplified),
36
+ which is a simplified version of [`argilla/dpo-mix-7k`](https://huggingface.co/datasets/argilla/dpo-mix-7k).
37
+
38
+ The simplification comes from the fact that the `prompt` column is detached from both the `chosen` and `rejected`
39
+ columns so that there's no need for extra pre-processing while applying the chat template to the dataset before the
40
+ fine-tuning. So on, the dataset remains as is, with an additional column for the `prompt`.
41
+
42
+ The dataset is a small cocktail combining Argilla's latest efforts on DPO datasets, mixing the following datasets:
43
+
44
+ * [`argilla/distilabel-capybara-dpo-7k-binarized`](https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized)
45
+ * [`argilla/distilabel-intel-orca-dpo-pairs`](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs)
46
+ * [`argilla/ultrafeedback-binarized-preferences-cleaned`](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned)
47
+
48
+ The samples have been randomly selected from the original datasets with a proportion of 0.33 each, as can be seen via the `dataset` column of the dataset.
49
+
50
+ For more information about the original dataset check [the `README.md` file of `argilla/dpo-mix-7k`](https://huggingface.co/datasets/argilla/dpo-mix-7k/blob/main/README.md).