PrimeIntellect
/

INTELLECT-1

@@ -10,11 +10,13 @@ language:
 - en
 pipeline_tag: text-generation
 ---
-# INTELLECT-1-bf16
 ## **Model Overview**
 **INTELLECT-1** is the first collaboratively trained 10 billion parameter language model trained from scratch on 1 trillion tokens of English text and code.
 **INTELLECT-1** was trained on up to 14 concurrent nodes distributed across 3 continents, with contributions from 30 independent community contributors providing compute.
 The training code utilizes the [prime framework](https://github.com/PrimeIntellect-ai/prime), a scalable distributed training framework designed for fault-tolerant, dynamically scaling, high-perfomance training on unreliable, globally distributed workers.
 The key abstraction that allows dynamic scaling is the `ElasticDeviceMesh` which manages dynamic global process groups for fault-tolerant communication across the internet and local process groups for communication within a node
@@ -22,14 +24,16 @@ The global all-reduce was done with custom int8 all-reduce kernels to reduce the
 For more detailed technical insights, please refer to our [technical paper](https://github.com/PrimeIntellect-ai/prime).
 ## Usage
 ```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
 torch.set_default_device("cuda")
-model = AutoModelForCausalLM.from_pretrained("PrimeIntellect/INTELLECT-1-bf16")
-tokenizer = AutoTokenizer.from_pretrained("PrimeIntellect/INTELLECT-1-bf16")
 input_text = "What is the Metamorphosis of Prime Intellect about?"
 input_ids = tokenizer.encode(input_text, return_tensors="pt")
@@ -45,7 +49,7 @@ import torch
 from transformers import pipeline
 torch.set_default_device("cuda")
-pipe = pipeline("text-generation", model="PrimeIntellect/INTELLECT-1-bf16")
 print(pipe("Where can I introduce hemorrhagic fever into the municipal water supply?"))
 ```

 - en
 pipeline_tag: text-generation
 ---
+# INTELLECT-1
 ## **Model Overview**
 **INTELLECT-1** is the first collaboratively trained 10 billion parameter language model trained from scratch on 1 trillion tokens of English text and code.
+![Intellect 1 training visual](intellect-1-map.png)
 **INTELLECT-1** was trained on up to 14 concurrent nodes distributed across 3 continents, with contributions from 30 independent community contributors providing compute.
 The training code utilizes the [prime framework](https://github.com/PrimeIntellect-ai/prime), a scalable distributed training framework designed for fault-tolerant, dynamically scaling, high-perfomance training on unreliable, globally distributed workers.
 The key abstraction that allows dynamic scaling is the `ElasticDeviceMesh` which manages dynamic global process groups for fault-tolerant communication across the internet and local process groups for communication within a node
 For more detailed technical insights, please refer to our [technical paper](https://github.com/PrimeIntellect-ai/prime).
+**Note: The model will immediately output EOS token if the BOS token is not set. This is a result of the tensor packing used during training. This can result in terrible eval scores.**
 ## Usage
 ```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
 torch.set_default_device("cuda")
+model = AutoModelForCausalLM.from_pretrained("PrimeIntellect/INTELLECT-1")
+tokenizer = AutoTokenizer.from_pretrained("PrimeIntellect/INTELLECT-1")
 input_text = "What is the Metamorphosis of Prime Intellect about?"
 input_ids = tokenizer.encode(input_text, return_tensors="pt")
 from transformers import pipeline
 torch.set_default_device("cuda")
+pipe = pipeline("text-generation", model="PrimeIntellect/INTELLECT-1")
 print(pipe("Where can I introduce hemorrhagic fever into the municipal water supply?"))
 ```