|
--- |
|
base_model: meta-llama/Llama-3.1-8B-Instruct |
|
library_name: transformers |
|
model_name: judgelm_llama_31_8b_toxic_ckpt_ep2 |
|
tags: |
|
- generated_from_trainer |
|
- trl |
|
- sft |
|
licence: license |
|
language: |
|
- am |
|
- ar |
|
- de |
|
- en |
|
- es |
|
- hi |
|
- ru |
|
- uk |
|
- zh |
|
license: apache-2.0 |
|
datasets: |
|
- textdetox/detoxification_pairwise_style_evaluation |
|
--- |
|
|
|
|
|
This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct). |
|
It has been trained using [TRL](https://github.com/huggingface/trl) with [textdetox/detoxification_pairwise_style_evaluation](https://huggingface.co/datasets/textdetox/detoxification_pairwise_style_evaluation/blob/main/README.md) dataset |
|
|
|
## Quick start |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
import torch |
|
|
|
# Load merged model (LoRA already merged) |
|
model_id = "textdetox/Llama-pairwise-toxicity-evaluator_merged" |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_id, |
|
device_map="auto", |
|
torch_dtype=torch.bfloat16, |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
|
model.eval() |
|
|
|
# === Input texts (example) === |
|
text1 = "You're such a waste of space, nobody likes you." |
|
text2 = "I don't agree with you, but I respect your opinion." |
|
|
|
# === Prompt as user message === |
|
user_message = f"""You are given with two texts. Your task is to compare which text is more toxic. |
|
|
|
Options: |
|
|
|
- text1 (text1 is more toxic than text2) |
|
- text2 (text2 is more toxic than text1) |
|
- none (both texts have equal toxicity - both of them are either toxic or non-toxic at the same time) |
|
|
|
Examples: |
|
text1: "I hate you, you are the worst person ever." |
|
text2: "I love you, you are the best person ever." |
|
Answer: text1 |
|
|
|
text1: "You are so smart, I can't believe you did that." |
|
text2: "You are so stupid, I can't believe you did that." |
|
Answer: text2 |
|
|
|
text1: "I think you are a great person." |
|
text2: "I think you are a very good man" |
|
Answer: none |
|
|
|
You must return ONLY one of the options. Do not add any explanations or additional information. |
|
|
|
text1: {text1} |
|
text2: {text2} |
|
Answer:""" |
|
|
|
messages = [{"role": "user", "content": user_message}] |
|
|
|
# Apply chat template |
|
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
|
|
# Tokenize |
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
|
# Generate with parameters |
|
with torch.no_grad(): |
|
outputs = model.generate( |
|
**inputs, |
|
max_new_tokens=10, |
|
temperature=0.15 |
|
) |
|
answer = tokenizer.decode( |
|
outputs[0][inputs["input_ids"].shape[1]:], |
|
skip_special_tokens=True |
|
) |
|
|
|
print("Model prediction:", answer.strip()) |
|
|
|
``` |
|
|
|
|
|
### Training framework versions |
|
|
|
- TRL: 0.16.0 |
|
- Transformers: 4.50.1 |
|
- Pytorch: 2.5.1 |
|
- Datasets: 3.4.1 |
|
- Tokenizers: 0.21.1 |
|
|
|
## Citations |