prepare datasets
Browse files- README.md +5 -35
- scripts/pretrain-core-model-0.yaml +4 -3
    	
        README.md
    CHANGED
    
    | @@ -44,7 +44,7 @@ tags: | |
| 44 | 
             
            - reason
         | 
| 45 | 
             
            ---
         | 
| 46 |  | 
| 47 | 
            -
            # tangled-alpha-0. | 
| 48 |  | 
| 49 | 
             
            
         | 
| 50 |  | 
| @@ -53,44 +53,14 @@ time python -B prepare_core_datasets.py | |
| 53 | 
             
            ```
         | 
| 54 |  | 
| 55 | 
             
            ```
         | 
| 56 | 
            -
             | 
| 57 | 
            -
            Workers are finished.██| 220/220 [23:15<00:00,  6.34s/it]
         | 
| 58 | 
            -
            Finished data processing!
         | 
| 59 | 
            -
            i=0, block_size=8192, chunk_size=16384000, len(dataset)=893355, len(dataset) * block_size=7318364160
         | 
| 60 | 
            -
            Total number of tokens in the optimized dataset '../core-data-0-8192-2000' is 7318364160
         | 
| 61 | 
             
            ```
         | 
| 62 |  | 
| 63 | 
             
            ```bash
         | 
| 64 | 
            -
            CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain-core-model.yaml
         | 
| 65 | 
             
            ```
         | 
| 66 |  | 
| 67 | 
             
            ```
         | 
| 68 | 
            -
            Seed set to 23
         | 
| 69 | 
            -
            Time to instantiate model: 0.32 seconds.
         | 
| 70 | 
            -
            Total parameters: 217,088,512
         | 
| 71 | 
            -
            Verifying settings ...
         | 
| 72 | 
            -
            Measured TFLOPs: 3548.40
         | 
| 73 | 
            -
             | 
| 74 | 
            -
            Epoch 1 | iter 256 step 1 | loss train: 11.716, val: n/a | iter time: 1735.26 ms (step) remaining time: 4 days, 11:06:29
         | 
| 75 | 
            -
            Epoch 1 | iter 512 step 2 | loss train: 11.534, val: n/a | iter time: 1102.77 ms (step) remaining time: 4 days, 2:31:30
         | 
| 76 | 
            -
            Epoch 1 | iter 768 step 3 | loss train: 11.356, val: n/a | iter time: 1095.87 ms (step) remaining time: 3 days, 23:44:12
         | 
| 77 | 
            -
            Epoch 1 | iter 1024 step 4 | loss train: 11.162, val: n/a | iter time: 1099.92 ms (step) remaining time: 3 days, 22:18:27
         | 
| 78 | 
            -
            Epoch 1 | iter 1280 step 5 | loss train: 11.018, val: n/a | iter time: 1096.45 ms (step) remaining time: 3 days, 21:24:35
         | 
| 79 | 
            -
            Epoch 1 | iter 1536 step 6 | loss train: 10.901, val: n/a | iter time: 1093.65 ms (step) remaining time: 3 days, 20:48:11
         | 
| 80 | 
            -
            Epoch 1 | iter 1792 step 7 | loss train: 10.850, val: n/a | iter time: 1100.16 ms (step) remaining time: 3 days, 20:22:00
         | 
| 81 | 
            -
            Epoch 1 | iter 2048 step 8 | loss train: 10.780, val: n/a | iter time: 1092.67 ms (step) remaining time: 3 days, 20:01:57
         | 
| 82 | 
            -
            Epoch 1 | iter 2304 step 9 | loss train: 10.692, val: n/a | iter time: 1095.77 ms (step) remaining time: 3 days, 19:45:57
         | 
| 83 | 
            -
            Epoch 1 | iter 2560 step 10 | loss train: 10.678, val: n/a | iter time: 1092.12 ms (step) remaining time: 3 days, 19:32:43
         | 
| 84 | 
            -
            Epoch 1 | iter 2816 step 11 | loss train: 10.619, val: n/a | iter time: 1094.44 ms (step) remaining time: 3 days, 19:21:32
         | 
| 85 | 
            -
            Epoch 1 | iter 3072 step 12 | loss train: 10.588, val: n/a | iter time: 1102.51 ms (step) remaining time: 3 days, 19:12:30
         | 
| 86 | 
            -
            Epoch 1 | iter 3328 step 13 | loss train: 10.514, val: n/a | iter time: 1095.57 ms (step) remaining time: 3 days, 19:04:07
         | 
| 87 | 
            -
            Epoch 1 | iter 3584 step 14 | loss train: 10.472, val: n/a | iter time: 1104.00 ms (step) remaining time: 3 days, 18:56:56
         | 
| 88 | 
            -
            Epoch 1 | iter 3840 step 15 | loss train: 10.431, val: n/a | iter time: 1096.00 ms (step) remaining time: 3 days, 18:50:21
         | 
| 89 | 
            -
            Epoch 1 | iter 4096 step 16 | loss train: 10.392, val: n/a | iter time: 1098.34 ms (step) remaining time: 3 days, 18:44:25
         | 
| 90 | 
            -
            Epoch 1 | iter 4352 step 17 | loss train: 10.360, val: n/a | iter time: 1106.53 ms (step) remaining time: 3 days, 18:38:58
         | 
| 91 | 
            -
            Epoch 1 | iter 4608 step 18 | loss train: 10.329, val: n/a | iter time: 1084.95 ms (step) remaining time: 3 days, 18:33:58
         | 
| 92 | 
            -
            Epoch 1 | iter 4864 step 19 | loss train: 10.296, val: n/a | iter time: 1096.22 ms (step) remaining time: 3 days, 18:29:12
         | 
| 93 | 
            -
            Epoch 1 | iter 5120 step 20 | loss train: 10.236, val: n/a | iter time: 1093.39 ms (step) remaining time: 3 days, 18:24:51
         | 
| 94 | 
             
            # ...
         | 
| 95 | 
             
            ```
         | 
| 96 |  | 
| @@ -103,11 +73,11 @@ mv wandb wandb-pretrain-core | |
| 103 | 
             
            Chat with model:
         | 
| 104 |  | 
| 105 | 
             
            ```bash
         | 
| 106 | 
            -
            CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt chat ../out/pretrain-core/final
         | 
| 107 | 
             
            ```
         | 
| 108 |  | 
| 109 | 
             
            ```bash
         | 
| 110 | 
            -
            CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True time litgpt evaluate --tasks 'leaderboard' --out_dir '../evaluate/pretrain-core/leaderboard/' --batch_size 1 --dtype 'bfloat16' '../out/pretrain-core/final'
         | 
| 111 | 
             
            ```
         | 
| 112 |  | 
| 113 | 
             
            ```
         | 
|  | |
| 44 | 
             
            - reason
         | 
| 45 | 
             
            ---
         | 
| 46 |  | 
| 47 | 
            +
            # tangled-alpha-0.3-core
         | 
| 48 |  | 
| 49 | 
             
            
         | 
| 50 |  | 
|  | |
| 53 | 
             
            ```
         | 
| 54 |  | 
| 55 | 
             
            ```
         | 
| 56 | 
            +
            # ...
         | 
|  | |
|  | |
|  | |
|  | |
| 57 | 
             
            ```
         | 
| 58 |  | 
| 59 | 
             
            ```bash
         | 
| 60 | 
            +
            CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain-core-model-0.yaml
         | 
| 61 | 
             
            ```
         | 
| 62 |  | 
| 63 | 
             
            ```
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 64 | 
             
            # ...
         | 
| 65 | 
             
            ```
         | 
| 66 |  | 
|  | |
| 73 | 
             
            Chat with model:
         | 
| 74 |  | 
| 75 | 
             
            ```bash
         | 
| 76 | 
            +
            CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt chat ../out/pretrain-core-0/final
         | 
| 77 | 
             
            ```
         | 
| 78 |  | 
| 79 | 
             
            ```bash
         | 
| 80 | 
            +
            CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True time litgpt evaluate --tasks 'leaderboard' --out_dir '../evaluate/pretrain-core-0/leaderboard/' --batch_size 1 --dtype 'bfloat16' '../out/pretrain-core-0/final'
         | 
| 81 | 
             
            ```
         | 
| 82 |  | 
| 83 | 
             
            ```
         | 
    	
        scripts/pretrain-core-model-0.yaml
    CHANGED
    
    | @@ -25,7 +25,7 @@ model_config: | |
| 25 |  | 
| 26 | 
             
            # Directory in which to save checkpoints and logs. If running in a Lightning Studio Job, look for it in
         | 
| 27 | 
             
            # /teamspace/jobs/<job-name>/share. (type: <class 'Path'>, default: out/pretrain)
         | 
| 28 | 
            -
            out_dir: "../out/pretrain-core/"
         | 
| 29 |  | 
| 30 | 
             
            # The precision to use for pretraining. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
         | 
| 31 | 
             
            # precision: bf16-mixed
         | 
| @@ -60,6 +60,7 @@ train: | |
| 60 | 
             
              # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 512)
         | 
| 61 | 
             
              global_batch_size: 512
         | 
| 62 | 
             
              # global_batch_size: 256
         | 
|  | |
| 63 |  | 
| 64 | 
             
              # Number of samples per data-parallel rank (type: int, default: 4)
         | 
| 65 | 
             
              micro_batch_size: 4
         | 
| @@ -67,7 +68,7 @@ train: | |
| 67 | 
             
              # micro_batch_size: 1
         | 
| 68 |  | 
| 69 | 
             
              # Number of iterations with learning rate warmup active (type: int, default: 2000)
         | 
| 70 | 
            -
              lr_warmup_steps:  | 
| 71 |  | 
| 72 | 
             
              # Number of epochs to train on (type: Optional[int], default: null)
         | 
| 73 | 
             
              epochs:
         | 
| @@ -93,7 +94,7 @@ train: | |
| 93 | 
             
            # Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
         | 
| 94 | 
             
            eval:
         | 
| 95 | 
             
              # Number of optimizer steps between evaluation calls (type: int, default: 1000)
         | 
| 96 | 
            -
              interval:  | 
| 97 |  | 
| 98 | 
             
              # Number of tokens to generate (type: Optional[int], default: null)
         | 
| 99 | 
             
              max_new_tokens:
         | 
|  | |
| 25 |  | 
| 26 | 
             
            # Directory in which to save checkpoints and logs. If running in a Lightning Studio Job, look for it in
         | 
| 27 | 
             
            # /teamspace/jobs/<job-name>/share. (type: <class 'Path'>, default: out/pretrain)
         | 
| 28 | 
            +
            out_dir: "../out/pretrain-core-0/"
         | 
| 29 |  | 
| 30 | 
             
            # The precision to use for pretraining. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
         | 
| 31 | 
             
            # precision: bf16-mixed
         | 
|  | |
| 60 | 
             
              # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 512)
         | 
| 61 | 
             
              global_batch_size: 512
         | 
| 62 | 
             
              # global_batch_size: 256
         | 
| 63 | 
            +
              # global_batch_size: 128
         | 
| 64 |  | 
| 65 | 
             
              # Number of samples per data-parallel rank (type: int, default: 4)
         | 
| 66 | 
             
              micro_batch_size: 4
         | 
|  | |
| 68 | 
             
              # micro_batch_size: 1
         | 
| 69 |  | 
| 70 | 
             
              # Number of iterations with learning rate warmup active (type: int, default: 2000)
         | 
| 71 | 
            +
              lr_warmup_steps: 500
         | 
| 72 |  | 
| 73 | 
             
              # Number of epochs to train on (type: Optional[int], default: null)
         | 
| 74 | 
             
              epochs:
         | 
|  | |
| 94 | 
             
            # Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
         | 
| 95 | 
             
            eval:
         | 
| 96 | 
             
              # Number of optimizer steps between evaluation calls (type: int, default: 1000)
         | 
| 97 | 
            +
              interval: 100
         | 
| 98 |  | 
| 99 | 
             
              # Number of tokens to generate (type: Optional[int], default: null)
         | 
| 100 | 
             
              max_new_tokens:
         | 

