lapp0's picture
End of training
7ca32b8 verified
|
raw
history blame
7.61 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_gpt2_attn_part_2
    results: []

distily_bench_gpt2_attn_part_2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 215.4055
  • eval_frwikippl: 1190.7479
  • eval_zhwikippl: 547.2146
  • eval_loss: 1.2012
  • eval_runtime: 86.3928
  • eval_samples_per_second: 57.875
  • eval_steps_per_second: 7.234

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=2.0, loss_fn=mse, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.2206 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second zhwikippl
teacher eval 30.2086 57.2728 18.1784
0 0 56314.7695 59887.2773 5.8256 86.2439 57.975 7.247 59033.8086
1000 0.0162 707.4770 4242.8809 1.8516 86.1491 58.039 7.255 11038.7695
2000 0.0323 507.7405 3239.7178 1.6796 86.186 58.014 7.252 1887.9902
3000 0.0485 425.0894 2858.4150 1.5756 86.0159 58.129 7.266 841.8765
4000 0.0646 361.4349 2351.2927 1.4943 86.0626 58.097 7.262 1237.3851
5000 0.0808 320.2736 1811.6420 1.4160 86.3077 57.932 7.242 941.8109
6000 0.0970 279.4263 1586.2935 1.3478 86.3392 57.911 7.239 744.3502
7000 0.1131 252.5366 1452.6782 1.2903 86.3844 57.881 7.235 651.1284
8000 0.1293 229.7639 1333.1338 1.2422 86.4019 57.869 7.234 586.1718
9000 0.1455 215.4055 1190.7479 1.2012 86.3928 57.875 7.234 547.2146
10000 0.1616 195.7073 1147.2347 1.1512 86.3689 57.891 7.236 673.5028
11000 0.1778 181.4088 1060.8735 1.1073 86.4921 57.809 7.226 521.8091
12000 0.1939 164.0534 896.9886 1.0636 86.3399 57.911 7.239 488.8237
13000 0.2101 157.4142 890.0587 1.0357 86.4286 57.851 7.231 510.7101
14000 0.2263 148.5198 793.2602 1.0069 86.4451 57.84 7.23 415.8904
15000 0.2424 143.5310 728.5455 0.9844 86.5014 57.803 7.225 414.5595
16000 0.2586 139.7042 766.6470 0.9726 86.5584 57.764 7.221 539.9557
17000 0.2747 136.4025 723.4780 0.9594 86.3816 57.883 7.235 877.2245
18000 0.2909 133.8320 733.1834 0.9461 86.4657 57.826 7.228 582.4266
19000 0.3071 130.4055 720.7795 0.9391 86.5854 57.746 7.218 564.7347
20000 0.3232 128.2763 679.3307 0.9259 86.469 57.824 7.228 364.2420
21000 0.3394 126.0545 666.4741 0.9208 86.3084 57.932 7.241 392.6297
22000 0.3556 126.3289 618.9599 0.9146 86.2819 57.95 7.244 383.1512
23000 0.3717 125.7710 652.6170 0.9106 86.3709 57.89 7.236 382.0272
24000 0.3879 121.7352 649.1292 0.9010 86.4132 57.862 7.233 407.5338
25000 0.4040 121.2164 677.1313 0.8985 86.5605 57.763 7.22 378.4727
26000 0.4202 121.4331 604.5543 0.8920 86.6149 57.727 7.216 400.5201
27000 0.4364 121.4896 636.5748 0.8898 86.977 57.486 7.186 344.3297
28000 0.4525 120.0641 614.8710 0.8867 86.9971 57.473 7.184 385.8209
29000 0.4687 121.5085 662.3517 0.8855 86.6921 57.675 7.209 386.8527
30000 0.4848 121.3954 620.4891 0.8915 86.9396 57.511 7.189 805.0448
31000 0.5010 119.1724 604.0428 0.8831 87.0473 57.44 7.18 382.2313
32000 0.5172 118.1496 632.1021 0.8800 87.0169 57.46 7.183 377.2617
33000 0.5333 116.5277 597.8567 0.8738 86.7512 57.636 7.205 322.2620
34000 0.5495 116.1844 591.6924 0.8734 87.2311 57.319 7.165 431.3317
35000 0.5657 115.5994 565.9454 0.8686 86.8167 57.593 7.199 336.3313
36000 0.5818 115.9320 609.9918 0.8674 87.1488 57.373 7.172 253.6102
37000 0.5980 115.0621 595.2911 0.8660 87.1004 57.405 7.176 323.4260
38000 0.6141 115.5635 590.6086 0.8654 86.9067 57.533 7.192 282.2412
39000 0.6303 113.5796 546.1489 0.8586 86.5012 57.803 7.225 306.1125
40000 0.6465 113.4385 558.4144 0.8583 86.6261 57.719 7.215 246.7947
41000 0.6626 112.7097 563.5562 0.8558 86.9289 57.518 7.19 263.4834
42000 0.6788 112.6048 556.9202 0.8573 86.8975 57.539 7.192 287.7979
43000 0.6949 112.9025 569.7087 0.8534 86.3213 57.923 7.24 295.2722
44000 0.7111 111.3180 584.7252 0.8534 86.7833 57.615 7.202 311.5563
45000 0.7273 112.7623 589.8597 0.8520 85.8832 58.219 7.277 452.9366
46000 0.7434 111.0763 583.6953 0.8497 86.9028 57.536 7.192 323.7285
47000 0.7596 110.0631 570.5529 0.8481 86.1396 58.045 7.256 278.4229
48000 0.7758 112.4039 498.8431 0.8470 86.0091 58.133 7.267 315.6181
49000 0.7919 111.2748 564.9885 0.8465 86.4014 57.869 7.234 261.0319
50000 0.8081 111.5950 594.9554 0.8454 87.4501 57.175 7.147 240.7725
51000 0.8242 110.0546 563.8345 0.8446 85.9134 58.198 7.275 320.1174
52000 0.8404 109.2966 548.4256 0.8428 86.5788 57.751 7.219 318.7099
53000 0.8566 109.3136 539.9846 0.8395 86.3394 57.911 7.239 340.8982
54000 0.8727 110.7834 561.4149 0.8436 86.2011 58.004 7.25 361.5285
55000 0.8889 110.2941 576.0907 0.8421 86.733 57.648 7.206 297.2107
56000 0.9051 109.5600 571.4385 0.8433 86.2508 57.97 7.246 370.3730
57000 0.9212 109.7474 566.3444 0.8457 86.5407 57.776 7.222 900.0065
58000 0.9374 109.4155 621.2332 0.8426 86.3669 57.893 7.237 493.3487
59000 0.9535 110.1230 581.3542 0.8391 86.4324 57.849 7.231 272.2826
60000 0.9697 108.2997 582.5030 0.8340 86.046 58.108 7.264 323.5555
61000 0.9859 109.2711 566.8240 0.8381 86.749 57.638 7.205 312.4312
61875 1.0 109.1439 575.3599 0.8346 86.8825 57.549 7.194 265.7449

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.20.0