adibvafa's picture
Update README.md
7e1a69d verified
|
raw
history blame
3.49 kB
---
library_name: transformers
tags:
- CodonTransformer
- Computational Biology
- Machine Learning
- Bioinformatics
- Synthetic Biology
license: apache-2.0
pipeline_tag: token-classification
---
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c9888b3137cc529d0761c4/GqKutRwiGGif69Gjd8Df3.png)
**Note this is the pretrained model. We recommend using the finetuned model available at https://huggingface.co/adibvafa/CodonTransformer**
**CodonTransformer** is the ultimate tool for codon optimization, transforming protein sequences into optimized DNA sequences specific for your target organisms. Whether you are a researcher or a practitioner in genetic engineering, CodonTransformer provides a comprehensive suite of features to facilitate your work. By leveraging the Transformer architecture and a user-friendly Jupyter notebook, it reduces the complexity of codon optimization, saving you time and effort.
## Use Case
**For an interactive demo, check out our [Google Colab Notebook.](https://adibvafa.github.io/CodonTransformer/GoogleColab)**
<br></br>
After installing CodonTransformer, you can use:
```python
import torch
from transformers import AutoTokenizer, BigBirdForMaskedLM
from CodonTransformer.CodonPrediction import predict_dna_sequence
from CodonTransformer.CodonJupyter import format_model_output
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("adibvafa/CodonTransformer")
model = BigBirdForMaskedLM.from_pretrained("adibvafa/CodonTransformer").to(DEVICE)
# Set your input data
protein = "MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG"
organism = "Escherichia coli general"
# Predict with CodonTransformer
output = predict_dna_sequence(
protein=protein,
organism=organism,
device=DEVICE,
tokenizer_object=tokenizer,
model_object=model,
attention_type="original_full",
)
print(format_model_output(output))
```
The output is:
<br>
```python
-----------------------------
| Organism |
-----------------------------
Escherichia coli general
-----------------------------
| Input Protein |
-----------------------------
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG
-----------------------------
| Processed Input |
-----------------------------
M_UNK A_UNK L_UNK W_UNK M_UNK R_UNK L_UNK L_UNK P_UNK L_UNK L_UNK A_UNK L_UNK L_UNK A_UNK L_UNK W_UNK G_UNK P_UNK D_UNK P_UNK A_UNK A_UNK A_UNK F_UNK V_UNK N_UNK Q_UNK H_UNK L_UNK C_UNK G_UNK S_UNK H_UNK L_UNK V_UNK E_UNK A_UNK L_UNK Y_UNK L_UNK V_UNK C_UNK G_UNK E_UNK R_UNK G_UNK F_UNK F_UNK Y_UNK T_UNK P_UNK K_UNK T_UNK R_UNK R_UNK E_UNK A_UNK E_UNK D_UNK L_UNK Q_UNK V_UNK G_UNK Q_UNK V_UNK E_UNK L_UNK G_UNK G_UNK __UNK
-----------------------------
| Predicted DNA |
-----------------------------
ATGGCTTTATGGATGCGTCTGCTGCCGCTGCTGGCGCTGCTGGCGCTGTGGGGCCCGGACCCGGCGGCGGCGTTTGTGAATCAGCACCTGTGCGGCAGCCACCTGGTGGAAGCGCTGTATCTGGTGTGCGGTGAGCGCGGCTTCTTCTACACGCCCAAAACCCGCCGCGAAGCGGAAGATCTGCAGGTGGGCCAGGTGGAGCTGGGCGGCTAA
```
## Additional Resources
- **Project Website** <br>
https://adibvafa.github.io/CodonTransformer/
- **GitHub Repository** <br>
https://github.com/Adibvafa/CodonTransformer
- **Google Colab Demo** <br>
https://adibvafa.github.io/CodonTransformer/GoogleColab
- **PyPI Package** <br>
https://pypi.org/project/CodonTransformer/
- **Paper** <br>
TBD