prepare datasets
Browse files- README.md +5 -35
- scripts/pretrain-core-model-0.yaml +4 -3
README.md
CHANGED
|
@@ -44,7 +44,7 @@ tags:
|
|
| 44 |
- reason
|
| 45 |
---
|
| 46 |
|
| 47 |
-
# tangled-alpha-0.
|
| 48 |
|
| 49 |

|
| 50 |
|
|
@@ -53,44 +53,14 @@ time python -B prepare_core_datasets.py
|
|
| 53 |
```
|
| 54 |
|
| 55 |
```
|
| 56 |
-
|
| 57 |
-
Workers are finished.██| 220/220 [23:15<00:00, 6.34s/it]
|
| 58 |
-
Finished data processing!
|
| 59 |
-
i=0, block_size=8192, chunk_size=16384000, len(dataset)=893355, len(dataset) * block_size=7318364160
|
| 60 |
-
Total number of tokens in the optimized dataset '../core-data-0-8192-2000' is 7318364160
|
| 61 |
```
|
| 62 |
|
| 63 |
```bash
|
| 64 |
-
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain-core-model.yaml
|
| 65 |
```
|
| 66 |
|
| 67 |
```
|
| 68 |
-
Seed set to 23
|
| 69 |
-
Time to instantiate model: 0.32 seconds.
|
| 70 |
-
Total parameters: 217,088,512
|
| 71 |
-
Verifying settings ...
|
| 72 |
-
Measured TFLOPs: 3548.40
|
| 73 |
-
|
| 74 |
-
Epoch 1 | iter 256 step 1 | loss train: 11.716, val: n/a | iter time: 1735.26 ms (step) remaining time: 4 days, 11:06:29
|
| 75 |
-
Epoch 1 | iter 512 step 2 | loss train: 11.534, val: n/a | iter time: 1102.77 ms (step) remaining time: 4 days, 2:31:30
|
| 76 |
-
Epoch 1 | iter 768 step 3 | loss train: 11.356, val: n/a | iter time: 1095.87 ms (step) remaining time: 3 days, 23:44:12
|
| 77 |
-
Epoch 1 | iter 1024 step 4 | loss train: 11.162, val: n/a | iter time: 1099.92 ms (step) remaining time: 3 days, 22:18:27
|
| 78 |
-
Epoch 1 | iter 1280 step 5 | loss train: 11.018, val: n/a | iter time: 1096.45 ms (step) remaining time: 3 days, 21:24:35
|
| 79 |
-
Epoch 1 | iter 1536 step 6 | loss train: 10.901, val: n/a | iter time: 1093.65 ms (step) remaining time: 3 days, 20:48:11
|
| 80 |
-
Epoch 1 | iter 1792 step 7 | loss train: 10.850, val: n/a | iter time: 1100.16 ms (step) remaining time: 3 days, 20:22:00
|
| 81 |
-
Epoch 1 | iter 2048 step 8 | loss train: 10.780, val: n/a | iter time: 1092.67 ms (step) remaining time: 3 days, 20:01:57
|
| 82 |
-
Epoch 1 | iter 2304 step 9 | loss train: 10.692, val: n/a | iter time: 1095.77 ms (step) remaining time: 3 days, 19:45:57
|
| 83 |
-
Epoch 1 | iter 2560 step 10 | loss train: 10.678, val: n/a | iter time: 1092.12 ms (step) remaining time: 3 days, 19:32:43
|
| 84 |
-
Epoch 1 | iter 2816 step 11 | loss train: 10.619, val: n/a | iter time: 1094.44 ms (step) remaining time: 3 days, 19:21:32
|
| 85 |
-
Epoch 1 | iter 3072 step 12 | loss train: 10.588, val: n/a | iter time: 1102.51 ms (step) remaining time: 3 days, 19:12:30
|
| 86 |
-
Epoch 1 | iter 3328 step 13 | loss train: 10.514, val: n/a | iter time: 1095.57 ms (step) remaining time: 3 days, 19:04:07
|
| 87 |
-
Epoch 1 | iter 3584 step 14 | loss train: 10.472, val: n/a | iter time: 1104.00 ms (step) remaining time: 3 days, 18:56:56
|
| 88 |
-
Epoch 1 | iter 3840 step 15 | loss train: 10.431, val: n/a | iter time: 1096.00 ms (step) remaining time: 3 days, 18:50:21
|
| 89 |
-
Epoch 1 | iter 4096 step 16 | loss train: 10.392, val: n/a | iter time: 1098.34 ms (step) remaining time: 3 days, 18:44:25
|
| 90 |
-
Epoch 1 | iter 4352 step 17 | loss train: 10.360, val: n/a | iter time: 1106.53 ms (step) remaining time: 3 days, 18:38:58
|
| 91 |
-
Epoch 1 | iter 4608 step 18 | loss train: 10.329, val: n/a | iter time: 1084.95 ms (step) remaining time: 3 days, 18:33:58
|
| 92 |
-
Epoch 1 | iter 4864 step 19 | loss train: 10.296, val: n/a | iter time: 1096.22 ms (step) remaining time: 3 days, 18:29:12
|
| 93 |
-
Epoch 1 | iter 5120 step 20 | loss train: 10.236, val: n/a | iter time: 1093.39 ms (step) remaining time: 3 days, 18:24:51
|
| 94 |
# ...
|
| 95 |
```
|
| 96 |
|
|
@@ -103,11 +73,11 @@ mv wandb wandb-pretrain-core
|
|
| 103 |
Chat with model:
|
| 104 |
|
| 105 |
```bash
|
| 106 |
-
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt chat ../out/pretrain-core/final
|
| 107 |
```
|
| 108 |
|
| 109 |
```bash
|
| 110 |
-
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True time litgpt evaluate --tasks 'leaderboard' --out_dir '../evaluate/pretrain-core/leaderboard/' --batch_size 1 --dtype 'bfloat16' '../out/pretrain-core/final'
|
| 111 |
```
|
| 112 |
|
| 113 |
```
|
|
|
|
| 44 |
- reason
|
| 45 |
---
|
| 46 |
|
| 47 |
+
# tangled-alpha-0.3-core
|
| 48 |
|
| 49 |

|
| 50 |
|
|
|
|
| 53 |
```
|
| 54 |
|
| 55 |
```
|
| 56 |
+
# ...
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
```
|
| 58 |
|
| 59 |
```bash
|
| 60 |
+
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain-core-model-0.yaml
|
| 61 |
```
|
| 62 |
|
| 63 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
# ...
|
| 65 |
```
|
| 66 |
|
|
|
|
| 73 |
Chat with model:
|
| 74 |
|
| 75 |
```bash
|
| 76 |
+
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt chat ../out/pretrain-core-0/final
|
| 77 |
```
|
| 78 |
|
| 79 |
```bash
|
| 80 |
+
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True time litgpt evaluate --tasks 'leaderboard' --out_dir '../evaluate/pretrain-core-0/leaderboard/' --batch_size 1 --dtype 'bfloat16' '../out/pretrain-core-0/final'
|
| 81 |
```
|
| 82 |
|
| 83 |
```
|
scripts/pretrain-core-model-0.yaml
CHANGED
|
@@ -25,7 +25,7 @@ model_config:
|
|
| 25 |
|
| 26 |
# Directory in which to save checkpoints and logs. If running in a Lightning Studio Job, look for it in
|
| 27 |
# /teamspace/jobs/<job-name>/share. (type: <class 'Path'>, default: out/pretrain)
|
| 28 |
-
out_dir: "../out/pretrain-core/"
|
| 29 |
|
| 30 |
# The precision to use for pretraining. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
|
| 31 |
# precision: bf16-mixed
|
|
@@ -60,6 +60,7 @@ train:
|
|
| 60 |
# Number of samples between optimizer steps across data-parallel ranks (type: int, default: 512)
|
| 61 |
global_batch_size: 512
|
| 62 |
# global_batch_size: 256
|
|
|
|
| 63 |
|
| 64 |
# Number of samples per data-parallel rank (type: int, default: 4)
|
| 65 |
micro_batch_size: 4
|
|
@@ -67,7 +68,7 @@ train:
|
|
| 67 |
# micro_batch_size: 1
|
| 68 |
|
| 69 |
# Number of iterations with learning rate warmup active (type: int, default: 2000)
|
| 70 |
-
lr_warmup_steps:
|
| 71 |
|
| 72 |
# Number of epochs to train on (type: Optional[int], default: null)
|
| 73 |
epochs:
|
|
@@ -93,7 +94,7 @@ train:
|
|
| 93 |
# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
|
| 94 |
eval:
|
| 95 |
# Number of optimizer steps between evaluation calls (type: int, default: 1000)
|
| 96 |
-
interval:
|
| 97 |
|
| 98 |
# Number of tokens to generate (type: Optional[int], default: null)
|
| 99 |
max_new_tokens:
|
|
|
|
| 25 |
|
| 26 |
# Directory in which to save checkpoints and logs. If running in a Lightning Studio Job, look for it in
|
| 27 |
# /teamspace/jobs/<job-name>/share. (type: <class 'Path'>, default: out/pretrain)
|
| 28 |
+
out_dir: "../out/pretrain-core-0/"
|
| 29 |
|
| 30 |
# The precision to use for pretraining. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
|
| 31 |
# precision: bf16-mixed
|
|
|
|
| 60 |
# Number of samples between optimizer steps across data-parallel ranks (type: int, default: 512)
|
| 61 |
global_batch_size: 512
|
| 62 |
# global_batch_size: 256
|
| 63 |
+
# global_batch_size: 128
|
| 64 |
|
| 65 |
# Number of samples per data-parallel rank (type: int, default: 4)
|
| 66 |
micro_batch_size: 4
|
|
|
|
| 68 |
# micro_batch_size: 1
|
| 69 |
|
| 70 |
# Number of iterations with learning rate warmup active (type: int, default: 2000)
|
| 71 |
+
lr_warmup_steps: 500
|
| 72 |
|
| 73 |
# Number of epochs to train on (type: Optional[int], default: null)
|
| 74 |
epochs:
|
|
|
|
| 94 |
# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
|
| 95 |
eval:
|
| 96 |
# Number of optimizer steps between evaluation calls (type: int, default: 1000)
|
| 97 |
+
interval: 100
|
| 98 |
|
| 99 |
# Number of tokens to generate (type: Optional[int], default: null)
|
| 100 |
max_new_tokens:
|