lapp0's picture
End of training
7c05548 verified
|
raw
history blame
4.32 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_gpt2_simple_objectives2
    results: []

distily_bench_gpt2_simple_objectives2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 512.2950
  • eval_frwikippl: 3101.1487
  • eval_zhwikippl: 191798.5312
  • eval_loss: 0.1841
  • eval_runtime: 38.3167
  • eval_samples_per_second: 52.197
  • eval_steps_per_second: 6.525

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: MultiObjective(logits_weight=1, logits_loss_fn=(fn:jsd_loss()), activations_weight=0.2, activations_loss_fn=(fn:soft_mse_loss()), attentions_weight=0, attentions_loss_fn=(fn:soft_mse_loss()))
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 10.3934 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second zhwikippl
teacher eval 30.2086 57.2728 18.1784
0 0 55129.1953 56939.0469 0.5624 38.2552 52.28 6.535 54824.1562
1000 0.0404 1574.1282 9231.6934 0.2450 38.197 52.36 6.545 323126.3438
2000 0.0808 1109.2875 6477.1401 0.2300 38.4705 51.988 6.498 361869.5
3000 0.1212 959.5045 5362.6807 0.2196 38.2081 52.345 6.543 331961.1875
4000 0.1616 849.7711 4466.3486 0.2122 38.1788 52.385 6.548 229626.5781
5000 0.2020 715.5438 3672.9194 0.2054 38.1399 52.439 6.555 179842.4219
6000 0.2424 674.3274 3799.3496 0.1993 38.1179 52.469 6.559 222800.1875
7000 0.2828 571.2620 3382.5713 0.1930 38.2139 52.337 6.542 174077.0156
8000 0.3232 530.7191 2989.9294 0.1883 38.251 52.286 6.536 218381.6875
9000 0.3636 512.2950 3101.1487 0.1841 38.3167 52.197 6.525 191798.5312
10000 0.4040 465.7365 2662.6936 0.1801 38.3474 52.155 6.519 149415.4531
11000 0.4444 435.4128 2513.4690 0.1768 38.3141 52.2 6.525 239843.7656
12000 0.4848 418.3436 2475.8303 0.1744 38.4612 52.001 6.5 213309.0469
13000 0.5253 386.7266 2253.5813 0.1722 38.3131 52.201 6.525 170715.9219
14000 0.5657 387.9898 2286.2295 0.1699 38.4198 52.057 6.507 168226.6875
15000 0.6061 381.0330 2336.0906 0.1681 38.3633 52.133 6.517 192619.9219
16000 0.6465 358.8618 2008.6333 0.1662 38.5711 51.852 6.482 109902.5547
17000 0.6869 354.6786 1894.4617 0.1651 38.3338 52.173 6.522 185501.1719
18000 0.7273 351.6073 1982.4641 0.1639 38.3968 52.088 6.511 157025.25
19000 0.7677 349.6740 2298.5125 0.1630 38.4227 52.053 6.507 302094.75
20000 0.8081 331.0454 1852.9810 0.1615 38.3923 52.094 6.512 188850.5469
21000 0.8485 325.8680 1841.2605 0.1604 38.5044 51.942 6.493 98031.1953
22000 0.8889 325.0340 2070.4631 0.1595 38.4226 52.053 6.507 161017.3438
23000 0.9293 312.5102 1947.8259 0.1585 38.4551 52.009 6.501 129694.3594
24000 0.9697 313.1418 1909.7499 0.1579 38.2352 52.308 6.538 171997.4531
24750 1.0 319.7268 2126.0857 0.1576 38.5377 51.897 6.487 784044.3125

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.20.0