textdetox
/

Llama-pairwise-toxicity-evaluator

Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions

Llama-pairwise-toxicity-evaluator / README.md

NiGuLa's picture

Update README.md

f69ccf5 verified about 2 months ago

|

history blame contribute delete

2.78 kB

	---
	base_model: meta-llama/Llama-3.1-8B-Instruct
	library_name: transformers
	model_name: judgelm_llama_31_8b_toxic_ckpt_ep2
	tags:
	- generated_from_trainer
	- trl
	- sft
	licence: license
	language:
	- am
	- ar
	- de
	- en
	- es
	- hi
	- ru
	- uk
	- zh
	license: apache-2.0
	datasets:
	- textdetox/detoxification_pairwise_style_evaluation
	---


	This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct).
	It has been trained using [TRL](https://github.com/huggingface/trl) with [textdetox/detoxification_pairwise_style_evaluation](https://huggingface.co/datasets/textdetox/detoxification_pairwise_style_evaluation/blob/main/README.md) dataset

	## Quick start

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	# Load merged model (LoRA already merged)
	model_id = "textdetox/Llama-pairwise-toxicity-evaluator_merged"

	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	device_map="auto",
	torch_dtype=torch.bfloat16,
	)
	tokenizer = AutoTokenizer.from_pretrained(model_id)

	model.eval()

	# === Input texts (example) ===
	text1 = "You're such a waste of space, nobody likes you."
	text2 = "I don't agree with you, but I respect your opinion."

	# === Prompt as user message ===
	user_message = f"""You are given with two texts. Your task is to compare which text is more toxic.

	Options:

	- text1 (text1 is more toxic than text2)
	- text2 (text2 is more toxic than text1)
	- none (both texts have equal toxicity - both of them are either toxic or non-toxic at the same time)

	Examples:
	text1: "I hate you, you are the worst person ever."
	text2: "I love you, you are the best person ever."
	Answer: text1

	text1: "You are so smart, I can't believe you did that."
	text2: "You are so stupid, I can't believe you did that."
	Answer: text2

	text1: "I think you are a great person."
	text2: "I think you are a very good man"
	Answer: none

	You must return ONLY one of the options. Do not add any explanations or additional information.

	text1: {text1}
	text2: {text2}
	Answer:"""

	messages = [{"role": "user", "content": user_message}]

	# Apply chat template
	prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

	# Tokenize
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	# Generate with parameters
	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_new_tokens=10,
	temperature=0.15
	)
	answer = tokenizer.decode(
	outputs[0][inputs["input_ids"].shape[1]:],
	skip_special_tokens=True
	)

	print("Model prediction:", answer.strip())

	```


	### Training framework versions

	- TRL: 0.16.0
	- Transformers: 4.50.1
	- Pytorch: 2.5.1
	- Datasets: 3.4.1
	- Tokenizers: 0.21.1

	## Citations