|
--- |
|
language: |
|
- en |
|
base_model: |
|
- OpenAssistant/reward-model-deberta-v3-large-v2 |
|
--- |
|
|
|
## ReMoDetect: Robust Detection of Large Language Model Generated Texts Using Reward Model |
|
|
|
ReMoDetect addresses the growing risks of large language model (LLM) usage, such as generating fake news, by improving detection of LLM-generated text (LGT). Unlike detecting individual models, ReMoDetect identifies common traits among LLMs by focusing on alignment training, where LLMs are fine-tuned to generate human-preferred text. Our key finding is that aligned LLMs produce texts with higher estimated preferences than human-written ones, making them detectable using a reward model trained on human preference distribution. |
|
|
|
In ReMoDetect, we introduce two training strategies to enhance the reward model’s detection performance: |
|
1. **Continual preference fine-tuning**, which pushes the reward model to further prefer aligned LGTs. |
|
2. **Reward modeling of Human/LLM mixed texts**, where we use rephrased human-written texts as a middle ground between LGTs and human texts to improve detection. |
|
|
|
This approach achieves state-of-the-art results across several LLMs. For more technical details, check out our [paper](https://arxiv.org/abs/2405.17382). |
|
|
|
Please check the [official repository](https://github.com/hyunseoklee-ai/ReMoDetect), and [project page](https://github.com/hyunseoklee-ai/ReMoDetect) for more implementation details and updates. |
|
|
|
|
|
#### How to Use |
|
``` python |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
|
model_id = "hyunseoki/ReMoDetect-deberta" |
|
tokenizer = AutoTokenizer.from_pretrained(model_id, cache_dir=cache_dir) |
|
detector = AutoModelForSequenceClassification.from_pretrained(model_id) |
|
|
|
text = 'This text was written by a person.' |
|
inputs = tokenizer(text, return_tensors='pt', truncation=True,max_length=512, padding=True) |
|
|
|
score = detector(**inputs).logits[0] |
|
print(score) |
|
|
|
``` |
|
|
|
### Citation |
|
|
|
If you find ReMoDetect-deberta useful for your work, please cite the following papers: |
|
|
|
``` latex |
|
@misc{lee2024remodetect, |
|
title={ReMoDetect: Reward Models Recognize Aligned LLM's Generations}, |
|
author={Hyunseok Lee and Jihoon Tack and Jinwoo Shin}, |
|
year={2024}, |
|
eprint={2405.17382}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.LG}, |
|
url={https://arxiv.org/abs/2405.17382}, |
|
} |
|
``` |