tangledgroup
/

tangled-alpha-0.3-core

@@ -44,7 +44,7 @@ tags:
 - reason
 ---
-# tangled-alpha-0.2-core
 ![logo](./misc/logo.jpg)
@@ -53,44 +53,14 @@ time python -B prepare_core_datasets.py
 ```
 ```
-Progress: 100%|████████| 220/220 [23:15<00:00,  6.34s/it]
-Workers are finished.██| 220/220 [23:15<00:00,  6.34s/it]
-Finished data processing!
-i=0, block_size=8192, chunk_size=16384000, len(dataset)=893355, len(dataset) * block_size=7318364160
-Total number of tokens in the optimized dataset '../core-data-0-8192-2000' is 7318364160
 ```
 ```bash
-CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain-core-model.yaml
 ```
 ```
-Seed set to 23
-Time to instantiate model: 0.32 seconds.
-Total parameters: 217,088,512
-Verifying settings ...
-Measured TFLOPs: 3548.40
-Epoch 1 | iter 256 step 1 | loss train: 11.716, val: n/a | iter time: 1735.26 ms (step) remaining time: 4 days, 11:06:29
-Epoch 1 | iter 512 step 2 | loss train: 11.534, val: n/a | iter time: 1102.77 ms (step) remaining time: 4 days, 2:31:30
-Epoch 1 | iter 768 step 3 | loss train: 11.356, val: n/a | iter time: 1095.87 ms (step) remaining time: 3 days, 23:44:12
-Epoch 1 | iter 1024 step 4 | loss train: 11.162, val: n/a | iter time: 1099.92 ms (step) remaining time: 3 days, 22:18:27
-Epoch 1 | iter 1280 step 5 | loss train: 11.018, val: n/a | iter time: 1096.45 ms (step) remaining time: 3 days, 21:24:35
-Epoch 1 | iter 1536 step 6 | loss train: 10.901, val: n/a | iter time: 1093.65 ms (step) remaining time: 3 days, 20:48:11
-Epoch 1 | iter 1792 step 7 | loss train: 10.850, val: n/a | iter time: 1100.16 ms (step) remaining time: 3 days, 20:22:00
-Epoch 1 | iter 2048 step 8 | loss train: 10.780, val: n/a | iter time: 1092.67 ms (step) remaining time: 3 days, 20:01:57
-Epoch 1 | iter 2304 step 9 | loss train: 10.692, val: n/a | iter time: 1095.77 ms (step) remaining time: 3 days, 19:45:57
-Epoch 1 | iter 2560 step 10 | loss train: 10.678, val: n/a | iter time: 1092.12 ms (step) remaining time: 3 days, 19:32:43
-Epoch 1 | iter 2816 step 11 | loss train: 10.619, val: n/a | iter time: 1094.44 ms (step) remaining time: 3 days, 19:21:32
-Epoch 1 | iter 3072 step 12 | loss train: 10.588, val: n/a | iter time: 1102.51 ms (step) remaining time: 3 days, 19:12:30
-Epoch 1 | iter 3328 step 13 | loss train: 10.514, val: n/a | iter time: 1095.57 ms (step) remaining time: 3 days, 19:04:07
-Epoch 1 | iter 3584 step 14 | loss train: 10.472, val: n/a | iter time: 1104.00 ms (step) remaining time: 3 days, 18:56:56
-Epoch 1 | iter 3840 step 15 | loss train: 10.431, val: n/a | iter time: 1096.00 ms (step) remaining time: 3 days, 18:50:21
-Epoch 1 | iter 4096 step 16 | loss train: 10.392, val: n/a | iter time: 1098.34 ms (step) remaining time: 3 days, 18:44:25
-Epoch 1 | iter 4352 step 17 | loss train: 10.360, val: n/a | iter time: 1106.53 ms (step) remaining time: 3 days, 18:38:58
-Epoch 1 | iter 4608 step 18 | loss train: 10.329, val: n/a | iter time: 1084.95 ms (step) remaining time: 3 days, 18:33:58
-Epoch 1 | iter 4864 step 19 | loss train: 10.296, val: n/a | iter time: 1096.22 ms (step) remaining time: 3 days, 18:29:12
-Epoch 1 | iter 5120 step 20 | loss train: 10.236, val: n/a | iter time: 1093.39 ms (step) remaining time: 3 days, 18:24:51
 # ...
 ```
@@ -103,11 +73,11 @@ mv wandb wandb-pretrain-core
 Chat with model:
 ```bash
-CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt chat ../out/pretrain-core/final
 ```
 ```bash
-CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True time litgpt evaluate --tasks 'leaderboard' --out_dir '../evaluate/pretrain-core/leaderboard/' --batch_size 1 --dtype 'bfloat16' '../out/pretrain-core/final'
 ```
 ```

 - reason
 ---
+# tangled-alpha-0.3-core
 ![logo](./misc/logo.jpg)
 ```
 ```
+# ...
 ```
 ```bash
+CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain-core-model-0.yaml
 ```
 ```
 # ...
 ```
 Chat with model:
 ```bash
+CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt chat ../out/pretrain-core-0/final
 ```
 ```bash
+CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True time litgpt evaluate --tasks 'leaderboard' --out_dir '../evaluate/pretrain-core-0/leaderboard/' --batch_size 1 --dtype 'bfloat16' '../out/pretrain-core-0/final'
 ```
 ```

scripts/pretrain-core-model-0.yaml CHANGED Viewed

@@ -25,7 +25,7 @@ model_config:
 # Directory in which to save checkpoints and logs. If running in a Lightning Studio Job, look for it in
 # /teamspace/jobs/<job-name>/share. (type: <class 'Path'>, default: out/pretrain)
-out_dir: "../out/pretrain-core/"
 # The precision to use for pretraining. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
 # precision: bf16-mixed
@@ -60,6 +60,7 @@ train:
   # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 512)
   global_batch_size: 512
   # global_batch_size: 256
   # Number of samples per data-parallel rank (type: int, default: 4)
   micro_batch_size: 4
@@ -67,7 +68,7 @@ train:
   # micro_batch_size: 1
   # Number of iterations with learning rate warmup active (type: int, default: 2000)
-  lr_warmup_steps: 200
   # Number of epochs to train on (type: Optional[int], default: null)
   epochs:
@@ -93,7 +94,7 @@ train:
 # Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
 eval:
   # Number of optimizer steps between evaluation calls (type: int, default: 1000)
-  interval: 50
   # Number of tokens to generate (type: Optional[int], default: null)
   max_new_tokens:

 # Directory in which to save checkpoints and logs. If running in a Lightning Studio Job, look for it in
 # /teamspace/jobs/<job-name>/share. (type: <class 'Path'>, default: out/pretrain)
+out_dir: "../out/pretrain-core-0/"
 # The precision to use for pretraining. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
 # precision: bf16-mixed
   # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 512)
   global_batch_size: 512
   # global_batch_size: 256
+  # global_batch_size: 128
   # Number of samples per data-parallel rank (type: int, default: 4)
   micro_batch_size: 4
   # micro_batch_size: 1
   # Number of iterations with learning rate warmup active (type: int, default: 2000)
+  lr_warmup_steps: 500
   # Number of epochs to train on (type: Optional[int], default: null)
   epochs:
 # Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
 eval:
   # Number of optimizer steps between evaluation calls (type: int, default: 1000)
+  interval: 100
   # Number of tokens to generate (type: Optional[int], default: null)
   max_new_tokens: