yueliu1999
/

GuardReasoner-8B

Text Classification

Generated from Trainer

Model card Files Files and versions

yueliu1999 commited on Feb 1

Commit

94ed87b

·

verified ·

1 Parent(s): 0d4081f

Update README.md

Files changed (1) hide show

README.md +13 -1

README.md CHANGED Viewed

@@ -15,4 +15,16 @@ pipeline_tag: text-classification
 # GuardReasoner 8B
 This model is a fine-tuned version of [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) via R-SFT and HS-DPO.
-This model is based on the paper [GuardReasoner: Towards Reasoning-based LLM Safeguards](https://huggingface.co/papers/2501.18492).

 # GuardReasoner 8B
 This model is a fine-tuned version of [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) via R-SFT and HS-DPO.
+This model is based on the paper [GuardReasoner: Towards Reasoning-based LLM Safeguards](https://huggingface.co/papers/2501.18492).
+The training data of R-SFT can be found in [GuardReasonerTrain](https://huggingface.co/datasets/yueliu1999/GuardReasonerTrain).
+```
+@article{GuardReasoner,
+  title={GuardReasoner: Towards Reasoning-based LLM Safeguards},
+  author={Liu, Yue and Gao, Hongcheng and Zhai, Shengfang and Jun, Xia and Wu, Tianyi and Xue, Zhiwei and Chen, Yulin and Kawaguchi, Kenji and Zhang, Jiaheng and Hooi, Bryan},
+  journal={arXiv preprint arXiv:2501.18492},
+  year={2025}
+}
+```