Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
Paper
• 2407.09121 • Published
• 6
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the Evol-Instruct and BeaverTails dataset. The model is continued to train 100 steps with DeRTa on LLaMA3-8B-Instruct.
Please refer to the paper Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training and GitHub DeRTa.
Input format:
[INST] Your Instruction [\INST]
The model is trained with DeRTa, showing a high safety performance.
More information needed
The following hyperparameters were used during training:
Base model
meta-llama/Meta-Llama-3-8B-InstructTotally Free + Zero Barriers + No Login Required