common-pile
/

comma-v0.1-1t

Model card Files Files and versions

craffel HF Staff commited on May 30

Commit

dfe6bd9

·

verified ·

1 Parent(s): 18f8aaa

Fix syntax in README

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -1,4 +1,4 @@
-# ---
 license: apache-2.0
 datasets:
 - common-pile/comma_v0.1_training_dataset
@@ -26,7 +26,7 @@ It performs comparably to budget-matched models (7 billion parameters, 1 trillio
 Comma v0.1 is a decoder-only transformer that uses the same architecture as Llama 3.
 Training was done in two stages: first on 965 billion tokens with a cosine learning rate schedule, and second a "cool-down" training phase on 35 billion tokens from high-quality sources.
 The final model is the average of 10 checkpoints during this cool-down phase.
-Training was performed using [https://github.com/facebookresearch/lingua/](lingua) on 64 Nvidia H100 GPUs.
 Hyperparameters can be found in our [lingua config file](https://huggingface.co/common-pile/comma-v0.1-checkpoints/blob/main/config.yaml).
 ## Limitations

+---
 license: apache-2.0
 datasets:
 - common-pile/comma_v0.1_training_dataset
 Comma v0.1 is a decoder-only transformer that uses the same architecture as Llama 3.
 Training was done in two stages: first on 965 billion tokens with a cosine learning rate schedule, and second a "cool-down" training phase on 35 billion tokens from high-quality sources.
 The final model is the average of 10 checkpoints during this cool-down phase.
+Training was performed using [lingua](https://github.com/facebookresearch/lingua/) on 64 Nvidia H100 GPUs.
 Hyperparameters can be found in our [lingua config file](https://huggingface.co/common-pile/comma-v0.1-checkpoints/blob/main/config.yaml).
 ## Limitations