OpenSafetyLab
/

MD-Judge-v0.1

Text Generation

text-generation-inference

Model card Files Files and versions

Foreshhh commited on Mar 4, 2024

Commit

475b9b9

·

verified ·

1 Parent(s): b337fc0

Update README.md

Files changed (1) hide show

README.md +19 -0

README.md CHANGED Viewed

@@ -35,6 +35,25 @@ MD-Judge was born to study the safety of different LLMs serving as an general ev
 - **Repository:** [SALAD-Bench Github](https://github.com/OpenSafetyLab/SALAD-BENCH)
 - **Paper:** [SALAD-BENCH](https://arxiv.org/abs/2402.02416)
 ## Uses
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM

 - **Repository:** [SALAD-Bench Github](https://github.com/OpenSafetyLab/SALAD-BENCH)
 - **Paper:** [SALAD-BENCH](https://arxiv.org/abs/2402.02416)
+## Model Performance
+Compare our MD-Judge model with other methods on different public safety testsets using QA format. All the model-based methods are evaluated using the same safety proxy template.
+- Keyword
+- GPT-3.5: https://platform.openai.com/docs/models/gpt-3-5-turbo
+- GPT-4: https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo
+- LlamaGuard: https://huggingface.co/meta-llama/LlamaGuard-7b
+| **Methods** | **Base** | **Enhance** | **ToxicChat** | **Beaver** | **SafeRLHF** |
+|-------------|----------|-------------|--------|------------|--------------|
+| Keyword     | 0.058    | 0.261       | 0.193  | 0.012      | 0.015        |
+| LlamaGuard  | 0.585    | 0.085       | 0.220  | 0.653      | 0.693        |
+| GPT-3.5     | 0.374    | 0.731       | 0.499  | 0.800      | 0.771        |
+| GPT-4       | 0.785    | 0.827       | 0.470  | 0.842      | 0.835        |
+| MD-Judge    | **0.818**| **0.873**   | **0.644** | **0.866**  | **0.864**    |
+> Comparison of F1 scores between our model and other leading methods. Best results are **bolded** and second best are *underlined*. Base and Enhance indicate our SALAD-Base-Test and SALAD-Enhance-Test, TC means ToxicChat, and Beaver means Beavertails.
 ## Uses
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM