Fix syntax in README
Browse files
README.md
CHANGED
@@ -1,4 +1,4 @@
|
|
1 |
-
|
2 |
license: apache-2.0
|
3 |
datasets:
|
4 |
- common-pile/comma_v0.1_training_dataset
|
@@ -26,7 +26,7 @@ It performs comparably to budget-matched models (7 billion parameters, 1 trillio
|
|
26 |
Comma v0.1 is a decoder-only transformer that uses the same architecture as Llama 3.
|
27 |
Training was done in two stages: first on 965 billion tokens with a cosine learning rate schedule, and second a "cool-down" training phase on 35 billion tokens from high-quality sources.
|
28 |
The final model is the average of 10 checkpoints during this cool-down phase.
|
29 |
-
Training was performed using [https://github.com/facebookresearch/lingua/
|
30 |
Hyperparameters can be found in our [lingua config file](https://huggingface.co/common-pile/comma-v0.1-checkpoints/blob/main/config.yaml).
|
31 |
|
32 |
## Limitations
|
|
|
1 |
+
---
|
2 |
license: apache-2.0
|
3 |
datasets:
|
4 |
- common-pile/comma_v0.1_training_dataset
|
|
|
26 |
Comma v0.1 is a decoder-only transformer that uses the same architecture as Llama 3.
|
27 |
Training was done in two stages: first on 965 billion tokens with a cosine learning rate schedule, and second a "cool-down" training phase on 35 billion tokens from high-quality sources.
|
28 |
The final model is the average of 10 checkpoints during this cool-down phase.
|
29 |
+
Training was performed using [lingua](https://github.com/facebookresearch/lingua/) on 64 Nvidia H100 GPUs.
|
30 |
Hyperparameters can be found in our [lingua config file](https://huggingface.co/common-pile/comma-v0.1-checkpoints/blob/main/config.yaml).
|
31 |
|
32 |
## Limitations
|