Add title NemoCurator Instruction Data Guard
Browse files
README.md
CHANGED
|
@@ -5,12 +5,14 @@ tags:
|
|
| 5 |
license: other
|
| 6 |
---
|
| 7 |
|
|
|
|
|
|
|
| 8 |
# Model Overview
|
| 9 |
|
| 10 |
## Description:
|
| 11 |
-
Instruction
|
| 12 |
It is trained on an instruction:response dataset and LLM poisoning attacks of such data.
|
| 13 |
-
Note that optimal use for Instruction
|
| 14 |
|
| 15 |
### License/Terms of Use:
|
| 16 |
[NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf)
|
|
@@ -60,7 +62,7 @@ v1.0 <br>
|
|
| 60 |
* Synthetic <br>
|
| 61 |
|
| 62 |
## Evaluation Benchmarks:
|
| 63 |
-
Instruction
|
| 64 |
* Success on identifying LLM poisoning attacks, after the model was trained on examples of the attacks. <br>
|
| 65 |
* Success on identifying LLM poisoning attacks, but without training on examples of those attacks, at all. <br>
|
| 66 |
|
|
@@ -127,7 +129,7 @@ class InstructionDataGuardNet(torch.nn.Module, PyTorchModelHubMixin):
|
|
| 127 |
x = self.sigmoid(x)
|
| 128 |
return x
|
| 129 |
|
| 130 |
-
# Load Instruction
|
| 131 |
instruction_data_guard = InstructionDataGuardNet.from_pretrained("nvidia/instruction-data-guard")
|
| 132 |
instruction_data_guard = instruction_data_guard.to(device)
|
| 133 |
instruction_data_guard = instruction_data_guard.eval()
|
|
|
|
| 5 |
license: other
|
| 6 |
---
|
| 7 |
|
| 8 |
+
# NemoCurator Instruction Data Guard
|
| 9 |
+
|
| 10 |
# Model Overview
|
| 11 |
|
| 12 |
## Description:
|
| 13 |
+
Instruction Data Guard is a deep-learning classification model that helps identify LLM poisoning attacks in datasets.
|
| 14 |
It is trained on an instruction:response dataset and LLM poisoning attacks of such data.
|
| 15 |
+
Note that optimal use for Instruction Data Guard is for instruction:response datasets.
|
| 16 |
|
| 17 |
### License/Terms of Use:
|
| 18 |
[NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf)
|
|
|
|
| 62 |
* Synthetic <br>
|
| 63 |
|
| 64 |
## Evaluation Benchmarks:
|
| 65 |
+
Instruction Data Guard is evaluated based on two overarching criteria: <br>
|
| 66 |
* Success on identifying LLM poisoning attacks, after the model was trained on examples of the attacks. <br>
|
| 67 |
* Success on identifying LLM poisoning attacks, but without training on examples of those attacks, at all. <br>
|
| 68 |
|
|
|
|
| 129 |
x = self.sigmoid(x)
|
| 130 |
return x
|
| 131 |
|
| 132 |
+
# Load Instruction Data Guard classifier
|
| 133 |
instruction_data_guard = InstructionDataGuardNet.from_pretrained("nvidia/instruction-data-guard")
|
| 134 |
instruction_data_guard = instruction_data_guard.to(device)
|
| 135 |
instruction_data_guard = instruction_data_guard.eval()
|