lapp0's picture
Training in progress, step 61875
c89c53d verified
|
raw
history blame
7.61 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_gpt2_attn_part_2
    results: []

distily_bench_gpt2_attn_part_2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 218.5393
  • eval_frwikippl: 1177.8887
  • eval_zhwikippl: 654.9657
  • eval_loss: 1.2101
  • eval_runtime: 84.5457
  • eval_samples_per_second: 59.14
  • eval_steps_per_second: 7.392

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 7.9371 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second zhwikippl
teacher eval 30.2086 57.2728 18.1784
0 0 57642.0977 56387.75 5.8468 84.9344 58.869 7.359 54838.8008
1000 0.0162 715.0994 4617.4824 1.8640 84.6424 59.072 7.384 16026.5967
2000 0.0323 509.6366 2986.9795 1.6848 84.231 59.361 7.42 1765.5764
3000 0.0485 427.6056 2689.4856 1.5835 84.4251 59.224 7.403 968.8549
4000 0.0646 370.2697 2295.2737 1.4992 84.3656 59.266 7.408 1006.3041
5000 0.0808 317.6974 1989.4651 1.4212 84.6665 59.055 7.382 1057.5557
6000 0.0970 285.2802 1632.1284 1.3569 84.6806 59.045 7.381 874.1838
7000 0.1131 256.3500 1458.8361 1.3018 84.4525 59.205 7.401 849.1024
8000 0.1293 237.8977 1317.0640 1.2529 84.5727 59.121 7.39 639.9225
9000 0.1455 218.5393 1177.8887 1.2101 84.5457 59.14 7.392 654.9657
10000 0.1616 202.2269 1126.2368 1.1660 84.6055 59.098 7.387 699.7226
11000 0.1778 184.4057 1090.2955 1.1237 84.6191 59.088 7.386 1601.5863
12000 0.1939 170.1912 972.0627 1.0802 84.6016 59.101 7.388 663.8590
13000 0.2101 160.8243 873.9529 1.0462 84.6676 59.054 7.382 817.5033
14000 0.2263 153.4316 853.4323 1.0216 84.5669 59.125 7.391 789.6068
15000 0.2424 144.5039 750.0720 0.9936 84.4757 59.189 7.399 497.9154
16000 0.2586 139.1196 713.8004 0.9741 84.6689 59.054 7.382 499.2470
17000 0.2747 136.0640 718.4965 0.9613 84.4587 59.201 7.4 691.9172
18000 0.2909 132.3746 717.3322 0.9476 84.7695 58.983 7.373 511.1194
19000 0.3071 131.7389 662.5851 0.9386 84.5077 59.166 7.396 483.8233
20000 0.3232 128.0474 670.8112 0.9298 84.837 58.937 7.367 464.8238
21000 0.3394 125.2350 678.9477 0.9209 84.6313 59.08 7.385 329.0456
22000 0.3556 125.6929 674.8433 0.9162 84.9393 58.866 7.358 347.1923
23000 0.3717 124.5367 630.1886 0.9094 85.9763 58.156 7.269 457.4956
24000 0.3879 123.1902 665.5817 0.9071 85.0672 58.777 7.347 311.9309
25000 0.4040 122.3417 641.2142 0.9017 85.0283 58.804 7.35 365.3137
26000 0.4202 120.2694 624.0430 0.8953 85.056 58.785 7.348 319.8610
27000 0.4364 120.1667 628.5027 0.8907 85.1161 58.743 7.343 319.6902
28000 0.4525 118.2781 570.9954 0.8846 85.144 58.724 7.341 272.5736
29000 0.4687 118.5724 595.4168 0.8842 85.137 58.729 7.341 268.9220
30000 0.4848 119.3669 594.5359 0.8814 84.9629 58.849 7.356 331.3828
31000 0.5010 117.9205 597.4355 0.8759 85.1582 58.714 7.339 352.8950
32000 0.5172 119.1076 616.9991 0.8873 85.3053 58.613 7.327 333.6027
33000 0.5333 117.0265 598.3629 0.8810 85.4345 58.524 7.316 329.0897
34000 0.5495 116.5639 591.6924 0.8745 85.4331 58.525 7.316 284.6257
35000 0.5657 116.2566 583.1194 0.8736 85.3062 58.612 7.327 312.5146
36000 0.5818 114.5094 569.9497 0.8699 85.3486 58.583 7.323 316.7582
37000 0.5980 115.3036 556.4886 0.8670 85.331 58.595 7.324 276.4224
38000 0.6141 114.4916 616.9991 0.8652 85.326 58.599 7.325 257.0539
39000 0.6303 113.8003 562.5639 0.8617 85.2521 58.65 7.331 249.4454
40000 0.6465 113.2449 589.2362 0.8608 85.4264 58.53 7.316 303.6291
41000 0.6626 113.5267 595.7949 0.8585 85.32 58.603 7.325 331.6926
42000 0.6788 112.8149 594.4523 0.8579 84.9462 58.861 7.358 352.5180
43000 0.6949 114.1189 599.4184 0.8609 84.9848 58.834 7.354 1005.4978
44000 0.7111 113.6678 552.8904 0.8595 85.0096 58.817 7.352 1579.9192
45000 0.7273 111.6644 655.0142 0.8554 85.0106 58.816 7.352 587.5825
46000 0.7434 113.8180 577.0257 0.8590 85.0044 58.821 7.353 429.3204
47000 0.7596 112.4737 534.5300 0.8557 84.9665 58.847 7.356 295.4299
48000 0.7758 112.1945 534.9068 0.8529 85.0374 58.798 7.35 355.7813
49000 0.7919 112.0117 588.8623 0.8545 85.2876 58.625 7.328 353.1778
50000 0.8081 110.6717 554.0220 0.8475 84.9061 58.889 7.361 320.5880
51000 0.8242 110.2171 533.4382 0.8444 84.9625 58.85 7.356 293.8561
52000 0.8404 109.8668 550.0522 0.8477 84.9033 58.891 7.361 292.7595
53000 0.8566 110.8953 522.9734 0.8430 84.9959 58.826 7.353 330.4548
54000 0.8727 113.6325 566.0253 0.8537 85.3083 58.611 7.326 435.0919
55000 0.8889 112.4562 600.7300 0.8536 85.3234 58.601 7.325 440.3524
56000 0.9051 112.5611 593.0288 0.8587 85.3061 58.612 7.327 713.9750
57000 0.9212 113.5531 569.5080 0.8588 85.2454 58.654 7.332 455.4841
58000 0.9374 110.8006 523.5637 0.8485 85.2585 58.645 7.331 420.0204
59000 0.9535 110.0631 563.5960 0.8472 85.1774 58.701 7.338 391.7394
60000 0.9697 109.0000 534.9446 0.8436 85.2097 58.679 7.335 323.9015
61000 0.9859 112.5698 560.3074 0.8420 85.18 58.699 7.337 256.8824
61875 1.0 109.4750 562.8416 0.8412 85.1958 58.688 7.336 313.6015

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.20.0