upstage
/

SOLAR-10.7B-Instruct-v1.0

Text Generation

text-generation-inference

Model card Files Files and versions

killawhale2 commited on Dec 14, 2023

Commit

9442df0

·

1 Parent(s): 2b079b2

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -50,7 +50,7 @@ filtering_task_list = [
 ]
 ```
-Using the datasets mentioned above, we apply SFT and iterative DPO training, a proprietary alignment strategy, to maximize the performance of our resulting model.
 [1] Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C.D. and Finn, C., 2023. Direct preference optimization: Your language model is secretly a reward model. NeurIPS.

 ]
 ```
+Using the datasets mentioned above, we applied SFT and iterative DPO training, a proprietary alignment strategy, to maximize the performance of our resulting model.
 [1] Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C.D. and Finn, C., 2023. Direct preference optimization: Your language model is secretly a reward model. NeurIPS.