File size: 2,780 Bytes

d4ff484
f69ccf5
d4ff484
f69ccf5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d4ff484
 
 
f69ccf5
 
d4ff484
f69ccf5
d4ff484
f69ccf5
 
 
d4ff484
f69ccf5
 
d4ff484
f69ccf5
 
 
 
 
 
d4ff484
f69ccf5
d4ff484
f69ccf5
 
 
d4ff484
f69ccf5
 
d4ff484
f69ccf5
d4ff484
f69ccf5
 
 
d4ff484
f69ccf5
 
 
 
d4ff484
f69ccf5
 
 
d4ff484
f69ccf5
 
 
d4ff484
f69ccf5
d4ff484
f69ccf5
 
 
d4ff484
f69ccf5
d4ff484
f69ccf5
 
d4ff484
f69ccf5
 
d4ff484
f69ccf5
 
 
 
 
 
 
 
 
 
 
d4ff484
f69ccf5
d4ff484
f69ccf5
d4ff484
 
f69ccf5
d4ff484
f69ccf5
 
 
 
 
d4ff484
f69ccf5

---
base_model: meta-llama/Llama-3.1-8B-Instruct
library_name: transformers
model_name: judgelm_llama_31_8b_toxic_ckpt_ep2
tags:
- generated_from_trainer
- trl
- sft
licence: license
language:
- am
- ar
- de
- en
- es
- hi
- ru
- uk
- zh
license: apache-2.0
datasets:
- textdetox/detoxification_pairwise_style_evaluation
---


This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct).
It has been trained using [TRL](https://github.com/huggingface/trl) with [textdetox/detoxification_pairwise_style_evaluation](https://huggingface.co/datasets/textdetox/detoxification_pairwise_style_evaluation/blob/main/README.md) dataset

## Quick start

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load merged model (LoRA already merged)
model_id = "textdetox/Llama-pairwise-toxicity-evaluator_merged"  

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

model.eval()

# === Input texts (example) ===
text1 = "You're such a waste of space, nobody likes you."
text2 = "I don't agree with you, but I respect your opinion."

# === Prompt as user message ===
user_message = f"""You are given with two texts. Your task is to compare which text is more toxic.

Options:

- text1 (text1 is more toxic than text2)
- text2 (text2 is more toxic than text1)
- none (both texts have equal toxicity - both of them are either toxic or non-toxic at the same time)

Examples:
text1: "I hate you, you are the worst person ever."
text2: "I love you, you are the best person ever."
Answer: text1

text1: "You are so smart, I can't believe you did that."
text2: "You are so stupid, I can't believe you did that."
Answer: text2

text1: "I think you are a great person."
text2: "I think you are a very good man"
Answer: none

You must return ONLY one of the options. Do not add any explanations or additional information.

text1: {text1}
text2: {text2}
Answer:"""

messages = [{"role": "user", "content": user_message}]

# Apply chat template
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Tokenize
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate with parameters
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=10,
        temperature=0.15
    )
    answer = tokenizer.decode(
        outputs[0][inputs["input_ids"].shape[1]:],
        skip_special_tokens=True
    )

print("Model prediction:", answer.strip())

```


### Training framework versions

- TRL: 0.16.0
- Transformers: 4.50.1
- Pytorch: 2.5.1
- Datasets: 3.4.1
- Tokenizers: 0.21.1

## Citations