ybelkada
commited on
Commit
·
df69d83
1
Parent(s):
0a1d0a4
Update README.md (#1)
Browse files- Update README.md (2b6926370063828b7c226ad18232cd564b18dc49)
README.md
CHANGED
|
@@ -122,11 +122,11 @@ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bi
|
|
| 122 |
|
| 123 |
* ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
|
| 124 |
|
| 125 |
-
*
|
| 126 |
|
| 127 |
-
*
|
| 128 |
|
| 129 |
-
* Hidden layers are
|
| 130 |
|
| 131 |
* Sequence length of 2048 tokens used (see [BLOOM tokenizer](https://huggingface.co/bigscience/tokenizer), [tokenizer description](#tokenization))
|
| 132 |
|
|
|
|
| 122 |
|
| 123 |
* ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
|
| 124 |
|
| 125 |
+
* 2.5 billion parameters:
|
| 126 |
|
| 127 |
+
* 30 layers, 32 attention heads
|
| 128 |
|
| 129 |
+
* Hidden layers are 2560-dimensional
|
| 130 |
|
| 131 |
* Sequence length of 2048 tokens used (see [BLOOM tokenizer](https://huggingface.co/bigscience/tokenizer), [tokenizer description](#tokenization))
|
| 132 |
|