hyunseoki
/

ReMoDetect-deberta

Model card Files Files and versions

ReMoDetect-deberta / README.md

hyunseoki's picture

Create README.md

9421869 verified about 1 year ago

|

history blame contribute delete

2.35 kB

	---
	language:
	- en
	base_model:
	- OpenAssistant/reward-model-deberta-v3-large-v2
	---

	## ReMoDetect: Robust Detection of Large Language Model Generated Texts Using Reward Model

	ReMoDetect addresses the growing risks of large language model (LLM) usage, such as generating fake news, by improving detection of LLM-generated text (LGT). Unlike detecting individual models, ReMoDetect identifies common traits among LLMs by focusing on alignment training, where LLMs are fine-tuned to generate human-preferred text. Our key finding is that aligned LLMs produce texts with higher estimated preferences than human-written ones, making them detectable using a reward model trained on human preference distribution.

	In ReMoDetect, we introduce two training strategies to enhance the reward model’s detection performance:
	1. Continual preference fine-tuning, which pushes the reward model to further prefer aligned LGTs.
	2. Reward modeling of Human/LLM mixed texts, where we use rephrased human-written texts as a middle ground between LGTs and human texts to improve detection.

	This approach achieves state-of-the-art results across several LLMs. For more technical details, check out our [paper](https://arxiv.org/abs/2405.17382).

	Please check the [official repository](https://github.com/hyunseoklee-ai/ReMoDetect), and [project page](https://github.com/hyunseoklee-ai/ReMoDetect) for more implementation details and updates.


	#### How to Use
	``` python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer

	model_id = "hyunseoki/ReMoDetect-deberta"
	tokenizer = AutoTokenizer.from_pretrained(model_id, cache_dir=cache_dir)
	detector = AutoModelForSequenceClassification.from_pretrained(model_id)

	text = 'This text was written by a person.'
	inputs = tokenizer(text, return_tensors='pt', truncation=True,max_length=512, padding=True)

	score = detector(**inputs).logits[0]
	print(score)

	```

	### Citation

	If you find ReMoDetect-deberta useful for your work, please cite the following papers:

	``` latex
	@misc{lee2024remodetect,
	title={ReMoDetect: Reward Models Recognize Aligned LLM's Generations},
	author={Hyunseok Lee and Jihoon Tack and Jinwoo Shin},
	year={2024},
	eprint={2405.17382},
	archivePrefix={arXiv},
	primaryClass={cs.LG},
	url={https://arxiv.org/abs/2405.17382},
	}
	```