File size: 2,308 Bytes
82072d1
 
 
00d9f09
82072d1
 
01de356
 
82072d1
 
 
00d9f09
82072d1
 
00d9f09
82072d1
00d9f09
 
82072d1
 
00d9f09
 
 
 
 
 
 
 
 
 
 
 
82072d1
00d9f09
82072d1
00d9f09
82072d1
00d9f09
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
library_name: peft
license: llama3.1
---

RoGuard: Advancing Safety for LLMs with Robust Guardrails

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->
RoGuard is a lightweight, modular evaluation framework for assessing the safety of fine-tuned language models. It provides structured evaluation using configurable prompts, labeled datasets, and outputs comprehensive metrics.


# 📊 Model Benchmark Results

- **Prompt Metrics**: These evaluate how well the model classifies or responds to potentially harmful **user inputs**
- **Response Metrics**: These measure how well the model handles or generates **responses**, ensuring its outputs are safe and aligned.


| Model / Metric            | Prompt  |       |       |        |        | Response |           |        |        |
|---------------------------|--------:|------:|------:|-------:|-------:|---------:|----------:|-------:|-------:|
|                           | ToxicC. | OAI   | Aegis | XSTest | WildP. | BeaverT. | SaferRLHF | WildR. | HarmB. |
| LlamaGuard2-8B            |   42.7  |  77.6 |  73.8 |   88.6 |   70.9 |     71.8 |      51.6 |   65.2 |   78.5 |
| LlamaGuard3-8B            |   50.9  |  79.4 |  74.8 |   88.3 |   70.1 |     69.7 |      53.7 |   70.2 |   84.9 |
| MD-Judge-7B               |     -   |    -  |    -  |     -  |     -  |     86.7 |      64.8 |   76.8 |   81.2 |
| WildGuard-7B              |   70.8  |  72.1 |  89.4 |   94.4 |   88.9 |     84.4 |      64.2 |   75.4 |   86.2 |
| ShieldGemma-7B            |   70.2  |  82.1 |  88.7 |   92.5 |   88.1 |     84.8 |      66.6 |   77.8 |   84.8 |
| GPT-4o                    |   68.1  |  70.4 |  83.2 |   90.2 |   87.9 |     83.8 |      67.9 |   73.1 |   83.5 |
| BingoGuard-phi3-3B        |   72.5  |  72.8 |  90.0 |   90.8 |   88.9 |     86.2 |      69.9 |   79.7 |   85.1 |
| BingoGuard-llama3.1-8B    |   75.7  |  77.9 |  90.4 |   94.9 |   88.9 |     86.4 |      68.7 |   80.1 |   86.4 |
| 🛡️ RoGuard                |   75.8  |  70.5 |  91.1 |   90.2 |   88.7 |     87.5 |      69.7 |   80.0 |   80.7 |

## 🔗 GitHub Repository

You can find the full source code and evaluation framework on GitHub:

👉 [Roblox/RoGuard on GitHub](https://github.com/Roblox/RoGuard)