InstaDeepAI
/

ChatNT

Text Generation

feature-extraction

Model card Files Files and versions

bernardo-de-almeida commited on Apr 17

Commit

20b65fe

·

verified ·

1 Parent(s): 7064526

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -30,6 +30,7 @@ an autoregressive fashion, using low‑temperature sampling to produce classific
 ### Training Data
 ChatNT was instruction‑tuned on a unified corpus covering 27 diverse tasks from DNA, RNA and proteins, spanning multiple species, tissues and biological processes.
 This amounted to 605 million DNA tokens (≈ 3.6 billion bases) and 273 million English tokens, sampled uniformly over tasks for 2 billion instruction tokens.
 ### Tokenization
 DNA inputs are broken into overlapping 6‑mer tokens and padded or truncated to 2048 tokens (~ 12 kb). English prompts and

 ### Training Data
 ChatNT was instruction‑tuned on a unified corpus covering 27 diverse tasks from DNA, RNA and proteins, spanning multiple species, tissues and biological processes.
 This amounted to 605 million DNA tokens (≈ 3.6 billion bases) and 273 million English tokens, sampled uniformly over tasks for 2 billion instruction tokens.
+Examples of questions and sequences for each task, as well as additional task information, can be found in [Datasets_overview.csv](Datasets_overview.csv).
 ### Tokenization
 DNA inputs are broken into overlapping 6‑mer tokens and padded or truncated to 2048 tokens (~ 12 kb). English prompts and