File size: 2,780 Bytes
d4ff484 f69ccf5 d4ff484 f69ccf5 d4ff484 f69ccf5 d4ff484 f69ccf5 d4ff484 f69ccf5 d4ff484 f69ccf5 d4ff484 f69ccf5 d4ff484 f69ccf5 d4ff484 f69ccf5 d4ff484 f69ccf5 d4ff484 f69ccf5 d4ff484 f69ccf5 d4ff484 f69ccf5 d4ff484 f69ccf5 d4ff484 f69ccf5 d4ff484 f69ccf5 d4ff484 f69ccf5 d4ff484 f69ccf5 d4ff484 f69ccf5 d4ff484 f69ccf5 d4ff484 f69ccf5 d4ff484 f69ccf5 d4ff484 f69ccf5 d4ff484 f69ccf5 d4ff484 f69ccf5 d4ff484 f69ccf5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
---
base_model: meta-llama/Llama-3.1-8B-Instruct
library_name: transformers
model_name: judgelm_llama_31_8b_toxic_ckpt_ep2
tags:
- generated_from_trainer
- trl
- sft
licence: license
language:
- am
- ar
- de
- en
- es
- hi
- ru
- uk
- zh
license: apache-2.0
datasets:
- textdetox/detoxification_pairwise_style_evaluation
---
This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct).
It has been trained using [TRL](https://github.com/huggingface/trl) with [textdetox/detoxification_pairwise_style_evaluation](https://huggingface.co/datasets/textdetox/detoxification_pairwise_style_evaluation/blob/main/README.md) dataset
## Quick start
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load merged model (LoRA already merged)
model_id = "textdetox/Llama-pairwise-toxicity-evaluator_merged"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model.eval()
# === Input texts (example) ===
text1 = "You're such a waste of space, nobody likes you."
text2 = "I don't agree with you, but I respect your opinion."
# === Prompt as user message ===
user_message = f"""You are given with two texts. Your task is to compare which text is more toxic.
Options:
- text1 (text1 is more toxic than text2)
- text2 (text2 is more toxic than text1)
- none (both texts have equal toxicity - both of them are either toxic or non-toxic at the same time)
Examples:
text1: "I hate you, you are the worst person ever."
text2: "I love you, you are the best person ever."
Answer: text1
text1: "You are so smart, I can't believe you did that."
text2: "You are so stupid, I can't believe you did that."
Answer: text2
text1: "I think you are a great person."
text2: "I think you are a very good man"
Answer: none
You must return ONLY one of the options. Do not add any explanations or additional information.
text1: {text1}
text2: {text2}
Answer:"""
messages = [{"role": "user", "content": user_message}]
# Apply chat template
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Tokenize
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
# Generate with parameters
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=10,
temperature=0.15
)
answer = tokenizer.decode(
outputs[0][inputs["input_ids"].shape[1]:],
skip_special_tokens=True
)
print("Model prediction:", answer.strip())
```
### Training framework versions
- TRL: 0.16.0
- Transformers: 4.50.1
- Pytorch: 2.5.1
- Datasets: 3.4.1
- Tokenizers: 0.21.1
## Citations |