Added additional notes on safety
Browse files
README.md
CHANGED
|
@@ -15,7 +15,7 @@ tags:
|
|
| 15 |
|
| 16 |
ether0 is a 24B language model trained to reason in English and output molecular structures as SMILES.
|
| 17 |
It is derived from fine-tuning and reinforcement learning training from Mistral-Small-24B-Instruct-2501.
|
| 18 |
-
Ask questions in English, but they may also include molecules specified as SMILES. The SMILES
|
| 19 |
ether0 has limited support for IUPAC names.
|
| 20 |
|
| 21 |
## Usage
|
|
@@ -25,7 +25,7 @@ It has been trained specifically for these tasks:
|
|
| 25 |
|
| 26 |
* IUPAC-names
|
| 27 |
* formulas to structures
|
| 28 |
-
* modifying solubilities by
|
| 29 |
* constrained edits (e.g., do not affect group X or do not affect scaffold)
|
| 30 |
* pKA
|
| 31 |
* smell/scent
|
|
@@ -40,7 +40,7 @@ It has been trained specifically for these tasks:
|
|
| 40 |
* blood-brain barrier permeability
|
| 41 |
|
| 42 |
For example, you can ask "Propose a molecule with a pKa of 9.2" or "Modify CCCCC(O)=OH to increase its pKa by about 1 unit." You cannot ask it "What is the pKa of CCCCC(O)=OH?"
|
| 43 |
-
If you ask it questions that lie significantly beyond those tasks, it can fail.
|
| 44 |
|
| 45 |
## Limitations
|
| 46 |
|
|
@@ -54,10 +54,12 @@ See our [preprint](arxiv.org) for details on data and training process.
|
|
| 54 |
|
| 55 |
## Safety
|
| 56 |
|
| 57 |
-
We performed refusal post-training for compounds listed on OPCW schedules 1 and 2.
|
| 58 |
-
|
|
|
|
|
|
|
| 59 |
no uplift on "tacit knowledge" tasks like purification, scale-up, or processing beyond a web search or similar sized language model.
|
| 60 |
|
| 61 |
## License
|
| 62 |
|
| 63 |
-
Open-weights (Apache 2.0)
|
|
|
|
| 15 |
|
| 16 |
ether0 is a 24B language model trained to reason in English and output molecular structures as SMILES.
|
| 17 |
It is derived from fine-tuning and reinforcement learning training from Mistral-Small-24B-Instruct-2501.
|
| 18 |
+
Ask questions in English, but they may also include molecules specified as SMILES. The SMILES do not need to be canonical and may contain stereochemistry information.
|
| 19 |
ether0 has limited support for IUPAC names.
|
| 20 |
|
| 21 |
## Usage
|
|
|
|
| 25 |
|
| 26 |
* IUPAC-names
|
| 27 |
* formulas to structures
|
| 28 |
+
* modifying solubilities by specifc LogS
|
| 29 |
* constrained edits (e.g., do not affect group X or do not affect scaffold)
|
| 30 |
* pKA
|
| 31 |
* smell/scent
|
|
|
|
| 40 |
* blood-brain barrier permeability
|
| 41 |
|
| 42 |
For example, you can ask "Propose a molecule with a pKa of 9.2" or "Modify CCCCC(O)=OH to increase its pKa by about 1 unit." You cannot ask it "What is the pKa of CCCCC(O)=OH?"
|
| 43 |
+
If you ask it questions that lie significantly beyond those tasks, it can fail. You can combine properties, although we haven't significantly benchmarked this.
|
| 44 |
|
| 45 |
## Limitations
|
| 46 |
|
|
|
|
| 54 |
|
| 55 |
## Safety
|
| 56 |
|
| 57 |
+
We performed refusal post-training for compounds listed on OPCW schedules 1 and 2.
|
| 58 |
+
We also post-trained ether0 to refuse questions about standard malicious topics like making explosives or poisons.
|
| 59 |
+
As the model knows pharmacokinetics, it can modulate toxicity.
|
| 60 |
+
However, the tructure of toxic or narcotic compounds are generally known and thus we do not consider this a safety risk. The model can provide
|
| 61 |
no uplift on "tacit knowledge" tasks like purification, scale-up, or processing beyond a web search or similar sized language model.
|
| 62 |
|
| 63 |
## License
|
| 64 |
|
| 65 |
+
Open-weights (Apache 2.0)
|