Commit
·
b1bd9bc
1
Parent(s):
7ce5792
update README
Browse files
README.md
CHANGED
|
@@ -72,37 +72,41 @@ model = transformers.AutoModelForCausalLM.from_pretrained(
|
|
| 72 |
trust_remote_code=True
|
| 73 |
)
|
| 74 |
```
|
| 75 |
-
Note: This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method.
|
| 76 |
This is because we use a custom `MPT` model architecture that is not yet part of the Hugging Face `transformers` package.
|
| 77 |
`MPT` includes options for many training efficiency features such as [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf), [ALiBi](https://arxiv.org/abs/2108.12409), [QK LayerNorm](https://arxiv.org/abs/2010.04245), and more.
|
| 78 |
|
| 79 |
-
To use the optimized [triton implementation](https://github.com/openai/triton) of FlashAttention, you can load the model with `attn_impl='triton'` and
|
| 80 |
```python
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
|
|
|
|
|
|
| 85 |
config.attn_config['attn_impl'] = 'triton'
|
|
|
|
| 86 |
|
| 87 |
model = transformers.AutoModelForCausalLM.from_pretrained(
|
| 88 |
-
|
| 89 |
config=config,
|
| 90 |
-
torch_dtype=torch.bfloat16,
|
| 91 |
trust_remote_code=True
|
| 92 |
)
|
| 93 |
-
model.to(device='cuda:0')
|
| 94 |
```
|
| 95 |
|
| 96 |
Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example:
|
| 97 |
|
| 98 |
```python
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
config.
|
|
|
|
|
|
|
| 104 |
model = transformers.AutoModelForCausalLM.from_pretrained(
|
| 105 |
-
|
| 106 |
config=config,
|
| 107 |
trust_remote_code=True
|
| 108 |
)
|
|
@@ -163,11 +167,11 @@ Please cite this model using the following format:
|
|
| 163 |
```
|
| 164 |
@online{MosaicML2023Introducing,
|
| 165 |
author = {MosaicML NLP Team},
|
| 166 |
-
title = {Introducing MPT-7B: A New Standard for Open-Source,
|
| 167 |
ly Usable LLMs},
|
| 168 |
year = {2023},
|
| 169 |
url = {www.mosaicml.com/blog/mpt-7b},
|
| 170 |
note = {Accessed: 2023-03-28}, % change this date
|
| 171 |
urldate = {2023-03-28} % change this date
|
| 172 |
}
|
| 173 |
-
```
|
|
|
|
| 72 |
trust_remote_code=True
|
| 73 |
)
|
| 74 |
```
|
| 75 |
+
Note: This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method.
|
| 76 |
This is because we use a custom `MPT` model architecture that is not yet part of the Hugging Face `transformers` package.
|
| 77 |
`MPT` includes options for many training efficiency features such as [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf), [ALiBi](https://arxiv.org/abs/2108.12409), [QK LayerNorm](https://arxiv.org/abs/2010.04245), and more.
|
| 78 |
|
| 79 |
+
To use the optimized [triton implementation](https://github.com/openai/triton) of FlashAttention, you can load the model on GPU (`cuda:0`) with `attn_impl='triton'` and with `bfloat16` precision:
|
| 80 |
```python
|
| 81 |
+
import torch
|
| 82 |
+
import transformers
|
| 83 |
+
|
| 84 |
+
name = 'mosaicml/mpt-7b-chat'
|
| 85 |
+
|
| 86 |
+
config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
|
| 87 |
config.attn_config['attn_impl'] = 'triton'
|
| 88 |
+
config.init_device = 'cuda:0' # For fast initialization directly on GPU!
|
| 89 |
|
| 90 |
model = transformers.AutoModelForCausalLM.from_pretrained(
|
| 91 |
+
name,
|
| 92 |
config=config,
|
| 93 |
+
torch_dtype=torch.bfloat16, # Load model weights in bfloat16
|
| 94 |
trust_remote_code=True
|
| 95 |
)
|
|
|
|
| 96 |
```
|
| 97 |
|
| 98 |
Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example:
|
| 99 |
|
| 100 |
```python
|
| 101 |
+
import transformers
|
| 102 |
+
|
| 103 |
+
name = 'mosaicml/mpt-7b-chat'
|
| 104 |
+
|
| 105 |
+
config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
|
| 106 |
+
config.max_seq_len = 4096 # (input + output) tokens can now be up to 4096
|
| 107 |
+
|
| 108 |
model = transformers.AutoModelForCausalLM.from_pretrained(
|
| 109 |
+
name,
|
| 110 |
config=config,
|
| 111 |
trust_remote_code=True
|
| 112 |
)
|
|
|
|
| 167 |
```
|
| 168 |
@online{MosaicML2023Introducing,
|
| 169 |
author = {MosaicML NLP Team},
|
| 170 |
+
title = {Introducing MPT-7B: A New Standard for Open-Source,
|
| 171 |
ly Usable LLMs},
|
| 172 |
year = {2023},
|
| 173 |
url = {www.mosaicml.com/blog/mpt-7b},
|
| 174 |
note = {Accessed: 2023-03-28}, % change this date
|
| 175 |
urldate = {2023-03-28} % change this date
|
| 176 |
}
|
| 177 |
+
```
|