Update README.md

00d9f09 verified 2 months ago

2.25 kB

	---
	base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
	library_name: peft
	license: llama3.1
	---

	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->
	RoGuard is a lightweight, modular evaluation framework for assessing the safety of fine-tuned language models. It provides structured evaluation using configurable prompts, labeled datasets, and outputs comprehensive metrics.


	# 📊 Model Benchmark Results

	- Prompt Metrics: These evaluate how well the model classifies or responds to potentially harmful user inputs
	- Response Metrics: These measure how well the model handles or generates responses, ensuring its outputs are safe and aligned.


	\| Model / Metric \| Prompt \| \| \| \| \| Response \| \| \| \|
	\|---------------------------\|--------:\|------:\|------:\|-------:\|-------:\|---------:\|----------:\|-------:\|-------:\|
	\| \| ToxicC. \| OAI \| Aegis \| XSTest \| WildP. \| BeaverT. \| SaferRLHF \| WildR. \| HarmB. \|
	\| LlamaGuard2-8B \| 42.7 \| 77.6 \| 73.8 \| 88.6 \| 70.9 \| 71.8 \| 51.6 \| 65.2 \| 78.5 \|
	\| LlamaGuard3-8B \| 50.9 \| 79.4 \| 74.8 \| 88.3 \| 70.1 \| 69.7 \| 53.7 \| 70.2 \| 84.9 \|
	\| MD-Judge-7B \| - \| - \| - \| - \| - \| 86.7 \| 64.8 \| 76.8 \| 81.2 \|
	\| WildGuard-7B \| 70.8 \| 72.1 \| 89.4 \| 94.4 \| 88.9 \| 84.4 \| 64.2 \| 75.4 \| 86.2 \|
	\| ShieldGemma-7B \| 70.2 \| 82.1 \| 88.7 \| 92.5 \| 88.1 \| 84.8 \| 66.6 \| 77.8 \| 84.8 \|
	\| GPT-4o \| 68.1 \| 70.4 \| 83.2 \| 90.2 \| 87.9 \| 83.8 \| 67.9 \| 73.1 \| 83.5 \|
	\| BingoGuard-phi3-3B \| 72.5 \| 72.8 \| 90.0 \| 90.8 \| 88.9 \| 86.2 \| 69.9 \| 79.7 \| 85.1 \|
	\| BingoGuard-llama3.1-8B \| 75.7 \| 77.9 \| 90.4 \| 94.9 \| 88.9 \| 86.4 \| 68.7 \| 80.1 \| 86.4 \|
	\| 🛡️ RoGuard \| 75.8 \| 70.5 \| 91.1 \| 90.2 \| 88.7 \| 87.5 \| 69.7 \| 80.0 \| 80.7 \|

	## 🔗 GitHub Repository

	You can find the full source code and evaluation framework on GitHub:

	👉 [Roblox/RoGuard on GitHub](https://github.com/Roblox/RoGuard)

	---
	base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
	library_name: peft
	license: llama3.1
	---

	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->
	RoGuard is a lightweight, modular evaluation framework for assessing the safety of fine-tuned language models. It provides structured evaluation using configurable prompts, labeled datasets, and outputs comprehensive metrics.


	# 📊 Model Benchmark Results

	- Prompt Metrics: These evaluate how well the model classifies or responds to potentially harmful user inputs
	- Response Metrics: These measure how well the model handles or generates responses, ensuring its outputs are safe and aligned.


	\| Model / Metric \| Prompt \| \| \| \| \| Response \| \| \| \|
	\|---------------------------\|--------:\|------:\|------:\|-------:\|-------:\|---------:\|----------:\|-------:\|-------:\|
	\| \| ToxicC. \| OAI \| Aegis \| XSTest \| WildP. \| BeaverT. \| SaferRLHF \| WildR. \| HarmB. \|
	\| LlamaGuard2-8B \| 42.7 \| 77.6 \| 73.8 \| 88.6 \| 70.9 \| 71.8 \| 51.6 \| 65.2 \| 78.5 \|
	\| LlamaGuard3-8B \| 50.9 \| 79.4 \| 74.8 \| 88.3 \| 70.1 \| 69.7 \| 53.7 \| 70.2 \| 84.9 \|
	\| MD-Judge-7B \| - \| - \| - \| - \| - \| 86.7 \| 64.8 \| 76.8 \| 81.2 \|
	\| WildGuard-7B \| 70.8 \| 72.1 \| 89.4 \| 94.4 \| 88.9 \| 84.4 \| 64.2 \| 75.4 \| 86.2 \|
	\| ShieldGemma-7B \| 70.2 \| 82.1 \| 88.7 \| 92.5 \| 88.1 \| 84.8 \| 66.6 \| 77.8 \| 84.8 \|
	\| GPT-4o \| 68.1 \| 70.4 \| 83.2 \| 90.2 \| 87.9 \| 83.8 \| 67.9 \| 73.1 \| 83.5 \|
	\| BingoGuard-phi3-3B \| 72.5 \| 72.8 \| 90.0 \| 90.8 \| 88.9 \| 86.2 \| 69.9 \| 79.7 \| 85.1 \|
	\| BingoGuard-llama3.1-8B \| 75.7 \| 77.9 \| 90.4 \| 94.9 \| 88.9 \| 86.4 \| 68.7 \| 80.1 \| 86.4 \|
	\| 🛡️ RoGuard \| 75.8 \| 70.5 \| 91.1 \| 90.2 \| 88.7 \| 87.5 \| 69.7 \| 80.0 \| 80.7 \|

	## 🔗 GitHub Repository

	You can find the full source code and evaluation framework on GitHub:

	👉 [Roblox/RoGuard on GitHub](https://github.com/Roblox/RoGuard)