alvarobartt
/

Mistral-7B-v0.1-ORPO

Text Generation

text-generation-inference

Model card Files Files and versions

Metrics Training metrics Community

alvarobartt HF Staff commited on Mar 22, 2024

Commit

8749fa0

·

verified ·

1 Parent(s): 16f4d77

Update README.md

Files changed (1) hide show

README.md +21 -1

README.md CHANGED Viewed

@@ -28,7 +28,27 @@ In order to fine-tune [`mistralai/Mistral-7B-v0.1`](https://huggingface.co/mistr
 `orpo` from [🤗`trl`](https://github.com/huggingface/trl) has been used, thanks to the invaluable and quick contribution of
 @kashif.
-📅 Fine-tuning code will be shared soon!
 ## About the dataset

 `orpo` from [🤗`trl`](https://github.com/huggingface/trl) has been used, thanks to the invaluable and quick contribution of
 @kashif.
+ORPO stands for Odds Ratio Preference Optimization, and defines a new paradigm on fine-tuning LLMs, “combining” both the SFT
+and the PPO/DPO stage into a single stage, thanks to the proposed loss function starting off from a preference dataset i.e.
+chosen-rejected pairs.
+Some key features about ORPO:
+- ⚡️ Faster to train as it’s now a single stage fine-tuning
+- 👨🏻‍🏫 Requires preference data i.e. (prompt, chosen, rejected)-like datasets
+- ⬇️ Less memory than PPO/DPO as doesn’t need a reference model
+- 🏆 SOTA results for Phi-2 (2.7B), Llama-2 (7B), and Mistral (7B) when fine-tuned using single-turn UltraFeedback
+Some notes on the experiments mentioned in the paper:
+- 📌 Up to 7B parameter LLMs were fine-tuned, achieving better performance compared to 7B counterparts and even 13B LLMs
+- 📌 Not yet trained with multi-turn datasets as Capybara (may be an interesting experiment to run)
+- 📌 For OPT models fine-tuned with HH-RLHF from Anthropic, truncated and padded to 1024 tokens, filtering out filtering the prompts with > 1024 tokens
+- 📌 For Phi-2, Mistral (7B) and Llama 2 (7B), or UltraFeedback from OpenBMB (truncated and padded to 2048 tokens), filtering out filtering the prompts with > 1024 tokens
+- 📌 Fine-tuned for 10 epochs, and using the evaluation loss as the metric for selecting the best models
+For more information about ORPO, I highly recommend reading their paper titled [`ORPO: Monolithic Preference Optimization without Reference Model`](https://huggingface.co/papers/2403.07691),
+as it contains a lot of information and details not only on the ORPO method, but also on the experiment they ran, the results they got, and much more.
+📅 Fine-tuning code will be shared soon, stay tuned!
 ## About the dataset