adibvafa
/

CodonTransformer-base

Token Classification

CodonTransformer

Computational Biology

Machine Learning

Synthetic Biology

Model card Files Files and versions

CodonTransformer-base / README.md

adibvafa's picture

Update README.md

7e1a69d verified about 1 year ago

|

3.49 kB

	---
	library_name: transformers
	tags:
	- CodonTransformer
	- Computational Biology
	- Machine Learning
	- Bioinformatics
	- Synthetic Biology
	license: apache-2.0
	pipeline_tag: token-classification
	---

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c9888b3137cc529d0761c4/GqKutRwiGGif69Gjd8Df3.png)
	Note this is the pretrained model. We recommend using the finetuned model available at https://huggingface.co/adibvafa/CodonTransformer


	CodonTransformer is the ultimate tool for codon optimization, transforming protein sequences into optimized DNA sequences specific for your target organisms. Whether you are a researcher or a practitioner in genetic engineering, CodonTransformer provides a comprehensive suite of features to facilitate your work. By leveraging the Transformer architecture and a user-friendly Jupyter notebook, it reduces the complexity of codon optimization, saving you time and effort.

	## Use Case
	For an interactive demo, check out our [Google Colab Notebook.](https://adibvafa.github.io/CodonTransformer/GoogleColab)
	<br></br>
	After installing CodonTransformer, you can use:
	```python
	import torch
	from transformers import AutoTokenizer, BigBirdForMaskedLM
	from CodonTransformer.CodonPrediction import predict_dna_sequence
	from CodonTransformer.CodonJupyter import format_model_output
	DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")


	# Load model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained("adibvafa/CodonTransformer")
	model = BigBirdForMaskedLM.from_pretrained("adibvafa/CodonTransformer").to(DEVICE)


	# Set your input data
	protein = "MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG"
	organism = "Escherichia coli general"


	# Predict with CodonTransformer
	output = predict_dna_sequence(
	protein=protein,
	organism=organism,
	device=DEVICE,
	tokenizer_object=tokenizer,
	model_object=model,
	attention_type="original_full",
	)
	print(format_model_output(output))
	```
	The output is:
	<br>


	```python
	-----------------------------
	\| Organism \|
	-----------------------------
	Escherichia coli general

	-----------------------------
	\| Input Protein \|
	-----------------------------
	MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG

	-----------------------------
	\| Processed Input \|
	-----------------------------
	M_UNK A_UNK L_UNK W_UNK M_UNK R_UNK L_UNK L_UNK P_UNK L_UNK L_UNK A_UNK L_UNK L_UNK A_UNK L_UNK W_UNK G_UNK P_UNK D_UNK P_UNK A_UNK A_UNK A_UNK F_UNK V_UNK N_UNK Q_UNK H_UNK L_UNK C_UNK G_UNK S_UNK H_UNK L_UNK V_UNK E_UNK A_UNK L_UNK Y_UNK L_UNK V_UNK C_UNK G_UNK E_UNK R_UNK G_UNK F_UNK F_UNK Y_UNK T_UNK P_UNK K_UNK T_UNK R_UNK R_UNK E_UNK A_UNK E_UNK D_UNK L_UNK Q_UNK V_UNK G_UNK Q_UNK V_UNK E_UNK L_UNK G_UNK G_UNK __UNK

	-----------------------------
	\| Predicted DNA \|
	-----------------------------
	ATGGCTTTATGGATGCGTCTGCTGCCGCTGCTGGCGCTGCTGGCGCTGTGGGGCCCGGACCCGGCGGCGGCGTTTGTGAATCAGCACCTGTGCGGCAGCCACCTGGTGGAAGCGCTGTATCTGGTGTGCGGTGAGCGCGGCTTCTTCTACACGCCCAAAACCCGCCGCGAAGCGGAAGATCTGCAGGTGGGCCAGGTGGAGCTGGGCGGCTAA
	```


	## Additional Resources
	- Project Website <br>
	https://adibvafa.github.io/CodonTransformer/

	- GitHub Repository <br>
	https://github.com/Adibvafa/CodonTransformer

	- Google Colab Demo <br>
	https://adibvafa.github.io/CodonTransformer/GoogleColab

	- PyPI Package <br>
	https://pypi.org/project/CodonTransformer/

	- Paper <br>
	TBD