Update README.md
Browse files
README.md
CHANGED
|
@@ -97,7 +97,6 @@ model-index:
|
|
| 97 |
| Replit Code V1.5 | 3B | 23.0% | 25.9%| 26.2% | 23.6%| 23.2%| 21.5%|
|
| 98 |
| Deci Coder | 1B | 19.1% | 6.8% | 18.4% | 16.7%| 2.1% | 1.7% |
|
| 99 |
|
| 100 |
-
|
| 101 |
**Key Features**
|
| 102 |
* Fill in Middle Capability (FIM)
|
| 103 |
* Supports Long Context, trained with Sequences upto 16,384
|
|
@@ -207,6 +206,26 @@ The model is a decoder-only transformer similar to the LLaMA ([Touvron et al., 2
|
|
| 207 |
|
| 208 |
The dataset is comprised of a filtered mixture of open-source large-scale datasets available on the [HuggingFace Hub](https://huggingface.co/datasets): Falcon RefinedWeb extract ([Penedo et al., 2023](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)), along with [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft) and [Github Issues](https://huggingface.co/datasets/bigcode/the-stack-github-issues) (BigCode., 2023), and StarCoder ([Li et al., 2023](https://arxiv.org/abs/2305.06161)). We further supplement our training with data from mathematical domains ([Azerbayev, Zhangir, et al., 2023](https://arxiv.org/abs/2310.10631) and, [Yu, Longhui, et al., 2023](https://arxiv.org/abs/2309.12284)).
|
| 209 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 210 |
### Training Procedure
|
| 211 |
|
| 212 |
The model is pre-trained on the aforementioned datasets in `bfloat16` precision, optimized with AdamW.
|
|
|
|
| 97 |
| Replit Code V1.5 | 3B | 23.0% | 25.9%| 26.2% | 23.6%| 23.2%| 21.5%|
|
| 98 |
| Deci Coder | 1B | 19.1% | 6.8% | 18.4% | 16.7%| 2.1% | 1.7% |
|
| 99 |
|
|
|
|
| 100 |
**Key Features**
|
| 101 |
* Fill in Middle Capability (FIM)
|
| 102 |
* Supports Long Context, trained with Sequences upto 16,384
|
|
|
|
| 206 |
|
| 207 |
The dataset is comprised of a filtered mixture of open-source large-scale datasets available on the [HuggingFace Hub](https://huggingface.co/datasets): Falcon RefinedWeb extract ([Penedo et al., 2023](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)), along with [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft) and [Github Issues](https://huggingface.co/datasets/bigcode/the-stack-github-issues) (BigCode., 2023), and StarCoder ([Li et al., 2023](https://arxiv.org/abs/2305.06161)). We further supplement our training with data from mathematical domains ([Azerbayev, Zhangir, et al., 2023](https://arxiv.org/abs/2310.10631) and, [Yu, Longhui, et al., 2023](https://arxiv.org/abs/2309.12284)).
|
| 208 |
|
| 209 |
+
Top 18 programming languages trained on:
|
| 210 |
+
- C
|
| 211 |
+
- CPP
|
| 212 |
+
- Java
|
| 213 |
+
- JavaScript
|
| 214 |
+
- CSS
|
| 215 |
+
- Go
|
| 216 |
+
- HTML
|
| 217 |
+
- Ruby
|
| 218 |
+
- Rust
|
| 219 |
+
- Markdown
|
| 220 |
+
- Shell
|
| 221 |
+
- Php
|
| 222 |
+
- Sql
|
| 223 |
+
- R
|
| 224 |
+
- Typescript
|
| 225 |
+
- Python
|
| 226 |
+
- Jupyter-Clean
|
| 227 |
+
- RestructuredText
|
| 228 |
+
|
| 229 |
### Training Procedure
|
| 230 |
|
| 231 |
The model is pre-trained on the aforementioned datasets in `bfloat16` precision, optimized with AdamW.
|