Commit
·
9442df0
1
Parent(s):
2b079b2
Update README.md
Browse files
README.md
CHANGED
|
@@ -50,7 +50,7 @@ filtering_task_list = [
|
|
| 50 |
]
|
| 51 |
```
|
| 52 |
|
| 53 |
-
Using the datasets mentioned above, we
|
| 54 |
|
| 55 |
[1] Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C.D. and Finn, C., 2023. Direct preference optimization: Your language model is secretly a reward model. NeurIPS.
|
| 56 |
|
|
|
|
| 50 |
]
|
| 51 |
```
|
| 52 |
|
| 53 |
+
Using the datasets mentioned above, we applied SFT and iterative DPO training, a proprietary alignment strategy, to maximize the performance of our resulting model.
|
| 54 |
|
| 55 |
[1] Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C.D. and Finn, C., 2023. Direct preference optimization: Your language model is secretly a reward model. NeurIPS.
|
| 56 |
|