alvarobartt HF Staff commited on
Commit
8749fa0
·
verified ·
1 Parent(s): 16f4d77

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -1
README.md CHANGED
@@ -28,7 +28,27 @@ In order to fine-tune [`mistralai/Mistral-7B-v0.1`](https://huggingface.co/mistr
28
  `orpo` from [🤗`trl`](https://github.com/huggingface/trl) has been used, thanks to the invaluable and quick contribution of
29
  @kashif.
30
 
31
- 📅 Fine-tuning code will be shared soon!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
  ## About the dataset
34
 
 
28
  `orpo` from [🤗`trl`](https://github.com/huggingface/trl) has been used, thanks to the invaluable and quick contribution of
29
  @kashif.
30
 
31
+ ORPO stands for Odds Ratio Preference Optimization, and defines a new paradigm on fine-tuning LLMs, “combining” both the SFT
32
+ and the PPO/DPO stage into a single stage, thanks to the proposed loss function starting off from a preference dataset i.e.
33
+ chosen-rejected pairs.
34
+
35
+ Some key features about ORPO:
36
+ - ⚡️ Faster to train as it’s now a single stage fine-tuning
37
+ - 👨🏻‍🏫 Requires preference data i.e. (prompt, chosen, rejected)-like datasets
38
+ - ⬇️ Less memory than PPO/DPO as doesn’t need a reference model
39
+ - 🏆 SOTA results for Phi-2 (2.7B), Llama-2 (7B), and Mistral (7B) when fine-tuned using single-turn UltraFeedback
40
+
41
+ Some notes on the experiments mentioned in the paper:
42
+ - 📌 Up to 7B parameter LLMs were fine-tuned, achieving better performance compared to 7B counterparts and even 13B LLMs
43
+ - 📌 Not yet trained with multi-turn datasets as Capybara (may be an interesting experiment to run)
44
+ - 📌 For OPT models fine-tuned with HH-RLHF from Anthropic, truncated and padded to 1024 tokens, filtering out filtering the prompts with > 1024 tokens
45
+ - 📌 For Phi-2, Mistral (7B) and Llama 2 (7B), or UltraFeedback from OpenBMB (truncated and padded to 2048 tokens), filtering out filtering the prompts with > 1024 tokens
46
+ - 📌 Fine-tuned for 10 epochs, and using the evaluation loss as the metric for selecting the best models
47
+
48
+ For more information about ORPO, I highly recommend reading their paper titled [`ORPO: Monolithic Preference Optimization without Reference Model`](https://huggingface.co/papers/2403.07691),
49
+ as it contains a lot of information and details not only on the ORPO method, but also on the experiment they ran, the results they got, and much more.
50
+
51
+ 📅 Fine-tuning code will be shared soon, stay tuned!
52
 
53
  ## About the dataset
54