metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
- generated_from_trainer
model-index:
- name: distily_bench_gpt2_attn_part_2
results: []
distily_bench_gpt2_attn_part_2
This student model is distilled from the teacher model gpt2 using the dataset (unspecified).
The Distily library was used for this distillation.
It achieves the following results on the evaluation set:
- eval_enwikippl: 215.4055
- eval_frwikippl: 1190.7479
- eval_zhwikippl: 547.2146
- eval_loss: 1.2012
- eval_runtime: 86.3928
- eval_samples_per_second: 57.875
- eval_steps_per_second: 7.234
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=2.0, loss_fn=mse, layer_mapper=None, projector=None))
- train_embeddings: True
- learning_rate: 4e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- num_epochs: 1.0
Resource Usage
Peak GPU Memory: 8.2206 GB
Eval-Phase Metrics
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
|---|---|---|---|---|---|---|---|---|
| teacher eval | 30.2086 | 57.2728 | 18.1784 | |||||
| 0 | 0 | 56314.7695 | 59887.2773 | 5.8256 | 86.2439 | 57.975 | 7.247 | 59033.8086 |
| 1000 | 0.0162 | 707.4770 | 4242.8809 | 1.8516 | 86.1491 | 58.039 | 7.255 | 11038.7695 |
| 2000 | 0.0323 | 507.7405 | 3239.7178 | 1.6796 | 86.186 | 58.014 | 7.252 | 1887.9902 |
| 3000 | 0.0485 | 425.0894 | 2858.4150 | 1.5756 | 86.0159 | 58.129 | 7.266 | 841.8765 |
| 4000 | 0.0646 | 361.4349 | 2351.2927 | 1.4943 | 86.0626 | 58.097 | 7.262 | 1237.3851 |
| 5000 | 0.0808 | 320.2736 | 1811.6420 | 1.4160 | 86.3077 | 57.932 | 7.242 | 941.8109 |
| 6000 | 0.0970 | 279.4263 | 1586.2935 | 1.3478 | 86.3392 | 57.911 | 7.239 | 744.3502 |
| 7000 | 0.1131 | 252.5366 | 1452.6782 | 1.2903 | 86.3844 | 57.881 | 7.235 | 651.1284 |
| 8000 | 0.1293 | 229.7639 | 1333.1338 | 1.2422 | 86.4019 | 57.869 | 7.234 | 586.1718 |
| 9000 | 0.1455 | 215.4055 | 1190.7479 | 1.2012 | 86.3928 | 57.875 | 7.234 | 547.2146 |
| 10000 | 0.1616 | 195.7073 | 1147.2347 | 1.1512 | 86.3689 | 57.891 | 7.236 | 673.5028 |
| 11000 | 0.1778 | 181.4088 | 1060.8735 | 1.1073 | 86.4921 | 57.809 | 7.226 | 521.8091 |
| 12000 | 0.1939 | 164.0534 | 896.9886 | 1.0636 | 86.3399 | 57.911 | 7.239 | 488.8237 |
| 13000 | 0.2101 | 157.4142 | 890.0587 | 1.0357 | 86.4286 | 57.851 | 7.231 | 510.7101 |
| 14000 | 0.2263 | 148.5198 | 793.2602 | 1.0069 | 86.4451 | 57.84 | 7.23 | 415.8904 |
| 15000 | 0.2424 | 143.5310 | 728.5455 | 0.9844 | 86.5014 | 57.803 | 7.225 | 414.5595 |
| 16000 | 0.2586 | 139.7042 | 766.6470 | 0.9726 | 86.5584 | 57.764 | 7.221 | 539.9557 |
| 17000 | 0.2747 | 136.4025 | 723.4780 | 0.9594 | 86.3816 | 57.883 | 7.235 | 877.2245 |
| 18000 | 0.2909 | 133.8320 | 733.1834 | 0.9461 | 86.4657 | 57.826 | 7.228 | 582.4266 |
| 19000 | 0.3071 | 130.4055 | 720.7795 | 0.9391 | 86.5854 | 57.746 | 7.218 | 564.7347 |
| 20000 | 0.3232 | 128.2763 | 679.3307 | 0.9259 | 86.469 | 57.824 | 7.228 | 364.2420 |
| 21000 | 0.3394 | 126.0545 | 666.4741 | 0.9208 | 86.3084 | 57.932 | 7.241 | 392.6297 |
| 22000 | 0.3556 | 126.3289 | 618.9599 | 0.9146 | 86.2819 | 57.95 | 7.244 | 383.1512 |
| 23000 | 0.3717 | 125.7710 | 652.6170 | 0.9106 | 86.3709 | 57.89 | 7.236 | 382.0272 |
| 24000 | 0.3879 | 121.7352 | 649.1292 | 0.9010 | 86.4132 | 57.862 | 7.233 | 407.5338 |
| 25000 | 0.4040 | 121.2164 | 677.1313 | 0.8985 | 86.5605 | 57.763 | 7.22 | 378.4727 |
| 26000 | 0.4202 | 121.4331 | 604.5543 | 0.8920 | 86.6149 | 57.727 | 7.216 | 400.5201 |
| 27000 | 0.4364 | 121.4896 | 636.5748 | 0.8898 | 86.977 | 57.486 | 7.186 | 344.3297 |
| 28000 | 0.4525 | 120.0641 | 614.8710 | 0.8867 | 86.9971 | 57.473 | 7.184 | 385.8209 |
| 29000 | 0.4687 | 121.5085 | 662.3517 | 0.8855 | 86.6921 | 57.675 | 7.209 | 386.8527 |
| 30000 | 0.4848 | 121.3954 | 620.4891 | 0.8915 | 86.9396 | 57.511 | 7.189 | 805.0448 |
| 31000 | 0.5010 | 119.1724 | 604.0428 | 0.8831 | 87.0473 | 57.44 | 7.18 | 382.2313 |
| 32000 | 0.5172 | 118.1496 | 632.1021 | 0.8800 | 87.0169 | 57.46 | 7.183 | 377.2617 |
| 33000 | 0.5333 | 116.5277 | 597.8567 | 0.8738 | 86.7512 | 57.636 | 7.205 | 322.2620 |
| 34000 | 0.5495 | 116.1844 | 591.6924 | 0.8734 | 87.2311 | 57.319 | 7.165 | 431.3317 |
| 35000 | 0.5657 | 115.5994 | 565.9454 | 0.8686 | 86.8167 | 57.593 | 7.199 | 336.3313 |
| 36000 | 0.5818 | 115.9320 | 609.9918 | 0.8674 | 87.1488 | 57.373 | 7.172 | 253.6102 |
| 37000 | 0.5980 | 115.0621 | 595.2911 | 0.8660 | 87.1004 | 57.405 | 7.176 | 323.4260 |
| 38000 | 0.6141 | 115.5635 | 590.6086 | 0.8654 | 86.9067 | 57.533 | 7.192 | 282.2412 |
| 39000 | 0.6303 | 113.5796 | 546.1489 | 0.8586 | 86.5012 | 57.803 | 7.225 | 306.1125 |
| 40000 | 0.6465 | 113.4385 | 558.4144 | 0.8583 | 86.6261 | 57.719 | 7.215 | 246.7947 |
| 41000 | 0.6626 | 112.7097 | 563.5562 | 0.8558 | 86.9289 | 57.518 | 7.19 | 263.4834 |
| 42000 | 0.6788 | 112.6048 | 556.9202 | 0.8573 | 86.8975 | 57.539 | 7.192 | 287.7979 |
| 43000 | 0.6949 | 112.9025 | 569.7087 | 0.8534 | 86.3213 | 57.923 | 7.24 | 295.2722 |
| 44000 | 0.7111 | 111.3180 | 584.7252 | 0.8534 | 86.7833 | 57.615 | 7.202 | 311.5563 |
| 45000 | 0.7273 | 112.7623 | 589.8597 | 0.8520 | 85.8832 | 58.219 | 7.277 | 452.9366 |
| 46000 | 0.7434 | 111.0763 | 583.6953 | 0.8497 | 86.9028 | 57.536 | 7.192 | 323.7285 |
| 47000 | 0.7596 | 110.0631 | 570.5529 | 0.8481 | 86.1396 | 58.045 | 7.256 | 278.4229 |
| 48000 | 0.7758 | 112.4039 | 498.8431 | 0.8470 | 86.0091 | 58.133 | 7.267 | 315.6181 |
| 49000 | 0.7919 | 111.2748 | 564.9885 | 0.8465 | 86.4014 | 57.869 | 7.234 | 261.0319 |
| 50000 | 0.8081 | 111.5950 | 594.9554 | 0.8454 | 87.4501 | 57.175 | 7.147 | 240.7725 |
| 51000 | 0.8242 | 110.0546 | 563.8345 | 0.8446 | 85.9134 | 58.198 | 7.275 | 320.1174 |
| 52000 | 0.8404 | 109.2966 | 548.4256 | 0.8428 | 86.5788 | 57.751 | 7.219 | 318.7099 |
| 53000 | 0.8566 | 109.3136 | 539.9846 | 0.8395 | 86.3394 | 57.911 | 7.239 | 340.8982 |
| 54000 | 0.8727 | 110.7834 | 561.4149 | 0.8436 | 86.2011 | 58.004 | 7.25 | 361.5285 |
| 55000 | 0.8889 | 110.2941 | 576.0907 | 0.8421 | 86.733 | 57.648 | 7.206 | 297.2107 |
| 56000 | 0.9051 | 109.5600 | 571.4385 | 0.8433 | 86.2508 | 57.97 | 7.246 | 370.3730 |
| 57000 | 0.9212 | 109.7474 | 566.3444 | 0.8457 | 86.5407 | 57.776 | 7.222 | 900.0065 |
| 58000 | 0.9374 | 109.4155 | 621.2332 | 0.8426 | 86.3669 | 57.893 | 7.237 | 493.3487 |
| 59000 | 0.9535 | 110.1230 | 581.3542 | 0.8391 | 86.4324 | 57.849 | 7.231 | 272.2826 |
| 60000 | 0.9697 | 108.2997 | 582.5030 | 0.8340 | 86.046 | 58.108 | 7.264 | 323.5555 |
| 61000 | 0.9859 | 109.2711 | 566.8240 | 0.8381 | 86.749 | 57.638 | 7.205 | 312.4312 |
| 61875 | 1.0 | 109.1439 | 575.3599 | 0.8346 | 86.8825 | 57.549 | 7.194 | 265.7449 |
Framework versions
- Distily 0.2.0
- Transformers 4.44.0
- Pytorch 2.3.0
- Datasets 2.20.0