lapp0's picture
End of training
b2c2f51 verified
|
raw
history blame
4.23 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: istily_bench_gpt2_simple_objectives
    results: []

distily_bench_gpt2_simple_objectives

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 213.1260
  • eval_frwikippl: 1238.3538
  • eval_zhwikippl: 689.7033
  • eval_loss: 1.2684
  • eval_runtime: 33.9389
  • eval_samples_per_second: 58.929
  • eval_steps_per_second: 7.366

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: MultiObjective(logits_weight=1, logits_loss_fn=(fn:kl_divergence_loss()), activations_weight=0, activations_loss_fn=(fn:mse_loss()), attentions_weight=0, attentions_loss_fn=(fn:mse_loss()))
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 7.9371 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second zhwikippl
teacher eval 30.2086 57.2728 18.1784
0 0 57983.2695 56826.7539 5.9504 33.9223 58.958 7.37 51544.0508
1000 0.0404 716.3218 4663.2852 1.9522 34.1014 58.649 7.331 17271.0391
2000 0.0808 512.1357 3224.2202 1.7690 34.1187 58.619 7.327 2109.2849
3000 0.1212 418.9938 2658.5667 1.6652 34.1292 58.601 7.325 1129.3704
4000 0.1616 367.4342 2491.9417 1.5763 34.0919 58.665 7.333 798.7274
5000 0.2020 317.3523 1897.4025 1.4963 33.965 58.884 7.361 962.9218
6000 0.2424 282.9857 1585.8464 1.4222 33.9768 58.864 7.358 852.0554
7000 0.2828 251.4994 1421.8730 1.3623 33.9388 58.93 7.366 753.7527
8000 0.3232 229.7460 1314.6521 1.3137 34.0289 58.773 7.347 729.5888
9000 0.3636 213.1260 1238.3538 1.2684 33.9389 58.929 7.366 689.7033
10000 0.4040 197.5243 1147.7201 1.2172 34.1028 58.646 7.331 761.6445
11000 0.4444 178.5023 1065.9717 1.1681 34.111 58.632 7.329 697.0179
12000 0.4848 164.3850 941.9713 1.1267 34.1042 58.644 7.33 722.8970
13000 0.5253 157.2920 871.0618 1.0965 34.1353 58.59 7.324 484.9227
14000 0.5657 150.8093 806.3426 1.0674 34.0619 58.717 7.34 539.5954
15000 0.6061 143.2526 816.5259 1.0499 34.2668 58.366 7.296 509.8925
16000 0.6465 139.8671 715.0598 1.0314 34.0375 58.759 7.345 426.2927
17000 0.6869 134.8648 739.3088 1.0151 34.0663 58.709 7.339 458.1682
18000 0.7273 132.5907 675.8909 1.0007 33.9807 58.857 7.357 348.7257
19000 0.7677 129.5074 665.1128 0.9937 34.017 58.794 7.349 350.5464
20000 0.8081 127.9778 683.8963 0.9837 33.9292 58.946 7.368 395.9997
21000 0.8485 125.7319 659.5090 0.9754 33.985 58.849 7.356 518.3367
22000 0.8889 124.8950 691.0702 0.9696 34.2015 58.477 7.31 610.1314
23000 0.9293 123.7751 644.4776 0.9625 34.1656 58.538 7.317 321.7459
24000 0.9697 122.1613 658.5797 0.9586 33.975 58.867 7.358 353.6970
24750 1.0 119.9802 652.2029 0.9537 34.2146 58.455 7.307 339.4447

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.20.0