Update README.md
Browse files
README.md
CHANGED
|
@@ -13,7 +13,7 @@ base_model:
|
|
| 13 |
# Granite-3.1-8B-Instruct
|
| 14 |
|
| 15 |
**Model Summary:**
|
| 16 |
-
Granite-3.1-8B-Instruct is a 8B parameter model finetuned from *Granite-3.1-8B-Base* using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging.
|
| 17 |
|
| 18 |
- **Developers:** Granite Team, IBM
|
| 19 |
- **GitHub Repository:** [ibm-granite/granite-3.1-language-models](https://github.com/ibm-granite/granite-3.1-language-models)
|
|
@@ -37,6 +37,7 @@ The model is designed to respond to general instructions and can be used to buil
|
|
| 37 |
* Code related tasks
|
| 38 |
* Function-calling tasks
|
| 39 |
* Multilingual dialog use cases
|
|
|
|
| 40 |
|
| 41 |
**Generation:**
|
| 42 |
This is a simple example of how to use Granite-3.1-8B-Instruct model.
|
|
@@ -98,7 +99,7 @@ Granite-3.1-8B-Instruct is based on a decoder-only dense transformer architectur
|
|
| 98 |
| # Training tokens | 12T | **12T** | 10T | 10T |
|
| 99 |
|
| 100 |
**Training Data:**
|
| 101 |
-
Overall, our SFT data is largely comprised of three key sources: (1) publicly available datasets with permissive license, (2) internal synthetic data targeting specific capabilities, and (3) very small amounts of human-curated data. A detailed attribution of datasets can be found in the [Granite Technical Report]() and [Accompanying Author List]().
|
| 102 |
|
| 103 |
**Infrastructure:**
|
| 104 |
We train Granite 3.1 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
|
|
|
|
| 13 |
# Granite-3.1-8B-Instruct
|
| 14 |
|
| 15 |
**Model Summary:**
|
| 16 |
+
Granite-3.1-8B-Instruct is a 8B parameter long-context instruct model finetuned from *Granite-3.1-8B-Base* using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets tailored for solving long context problems. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging.
|
| 17 |
|
| 18 |
- **Developers:** Granite Team, IBM
|
| 19 |
- **GitHub Repository:** [ibm-granite/granite-3.1-language-models](https://github.com/ibm-granite/granite-3.1-language-models)
|
|
|
|
| 37 |
* Code related tasks
|
| 38 |
* Function-calling tasks
|
| 39 |
* Multilingual dialog use cases
|
| 40 |
+
* Long-context tasks including long document/meeting summarization, long document QA, etc.
|
| 41 |
|
| 42 |
**Generation:**
|
| 43 |
This is a simple example of how to use Granite-3.1-8B-Instruct model.
|
|
|
|
| 99 |
| # Training tokens | 12T | **12T** | 10T | 10T |
|
| 100 |
|
| 101 |
**Training Data:**
|
| 102 |
+
Overall, our SFT data is largely comprised of three key sources: (1) publicly available datasets with permissive license, (2) internal synthetic data targeting specific capabilities including long-context tasks, and (3) very small amounts of human-curated data. A detailed attribution of datasets can be found in the [Granite Technical Report]() and [Accompanying Author List]().
|
| 103 |
|
| 104 |
**Infrastructure:**
|
| 105 |
We train Granite 3.1 Language Models using IBM's super computing cluster, Blue Vela, which is outfitted with NVIDIA H100 GPUs. This cluster provides a scalable and efficient infrastructure for training our models over thousands of GPUs.
|