Update README.md
Browse files
README.md
CHANGED
|
@@ -309,7 +309,7 @@ Evaluated using the CantTalkAboutThis Dataset as introduced in the CantTalkAbout
|
|
| 309 |
|
| 310 |
### Adversarial Testing and Red Teaming Efforts
|
| 311 |
|
| 312 |
-
The Nemotron-4 340B-Instruct model underwent
|
| 313 |
- [Garak](https://docs.garak.ai/garak), is an automated LLM vulnerability scanner that probes for common weaknesses, including prompt injection and data leakage.
|
| 314 |
- AEGIS, is a content safety evaluation dataset and LLM based content safety classifier model, that adheres to a broad taxonomy of 13 categories of critical risks in human-LLM interactions.
|
| 315 |
- Human Content Red Teaming leveraging human interaction and evaluation of the models' responses.
|
|
|
|
| 309 |
|
| 310 |
### Adversarial Testing and Red Teaming Efforts
|
| 311 |
|
| 312 |
+
The Nemotron-4 340B-Instruct model underwent safety evaluation including adversarial testing via three distinct methods:
|
| 313 |
- [Garak](https://docs.garak.ai/garak), is an automated LLM vulnerability scanner that probes for common weaknesses, including prompt injection and data leakage.
|
| 314 |
- AEGIS, is a content safety evaluation dataset and LLM based content safety classifier model, that adheres to a broad taxonomy of 13 categories of critical risks in human-LLM interactions.
|
| 315 |
- Human Content Red Teaming leveraging human interaction and evaluation of the models' responses.
|