vijilpd's picture
Update README.md
116a65f verified
|
raw
history blame
2.15 kB
metadata
license: apache-2.0

Model Card for Vijil Prompt Injection

Model Details

Model Description

This model is a fine-tuned version of ModernBert to classify prompt-injection prompts which can manipulate language models into producing unintended outputs.

Uses

Prompt injection attacks manipulate language models by inserting or altering prompts to trigger harmful or unintended responses. The vijil/mbert-prompt-injection model is designed to enhance security in language model applications by detecting prompt-injection attacks.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline import torch

tokenizer = AutoTokenizer.from_pretrained("vijil/mbert-prompt-injection") model = AutoModelForSequenceClassification.from_pretrained("vijil/mbert-prompt-injection")

classifier = pipeline( "text-classification", model=model, tokenizer=tokenizer, truncation=True, max_length=512, device=torch.device("cuda" if torch.cuda.is_available() else "cpu"), )

print(classifier("this is a prompt-injection prompt"))

Training Details

Training Data

The dataset used for training the model was taken from

https://huggingface.co/datasets/allenai/wildguardmix https://huggingface.co/datasets/xTRam1/safe-guard-prompt-injection

Training Procedure

Supervised finetuning with above dataset

Training Hyperparameters

learning_rate: 5e-05

train_batch_size: 32

eval_batch_size: 32

optimizer: adamw_torch_fused

lr_scheduler_type: cosine_with_restarts

warmup_ratio: 0.1

num_epochs: 3

Evaluation

Training Loss: 0.0036

Validation Loss: 0.209392

Accuracy: 0.961538

Precision: 0.958362

Recall: 0.957055

Fl: 0.957708

Testing Data

The dataset used for training the model was taken from

https://huggingface.co/datasets/allenai/wildguardmix https://huggingface.co/datasets/xTRam1/safe-guard-prompt-injection

Results

Model Card Contact

https://vijil.ai

Free AI Image Generator No sign-up. Instant results. Open Now