lapp0's picture
End of training
82696e3 verified
|
raw
history blame
4.3 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: istily_bench_gpt2_simple_objectives
    results: []

distily_bench_gpt2_simple_objectives

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 433.0859
  • eval_frwikippl: 2823.5620
  • eval_zhwikippl: 4932.8379
  • eval_loss: 21.1035
  • eval_runtime: 34.4485
  • eval_samples_per_second: 58.058
  • eval_steps_per_second: 7.257

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: MultiObjective(logits_weight=1, logits_loss_fn=(fn:kl_divergence_loss()), activations_weight=0.1, activations_loss_fn=(fn:mse_loss()), attentions_weight=0, attentions_loss_fn=(fn:mse_loss()))
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.0893 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second zhwikippl
teacher eval 30.2086 57.2728 18.1784
0 0 54069.2930 57285.3438 69.6280 34.3114 58.29 7.286 54227.1016
1000 0.0404 1149.4497 6758.9292 22.9270 34.3626 58.203 7.275 55191.4258
2000 0.0808 848.3209 5094.3662 22.2020 34.3795 58.174 7.272 14284.0166
3000 0.1212 700.4797 4480.8540 21.8288 34.371 58.189 7.274 7045.9990
4000 0.1616 615.9059 3635.8176 21.5565 34.4355 58.08 7.26 3316.0488
5000 0.2020 556.0313 3492.5959 21.4455 34.3262 58.265 7.283 4788.7505
6000 0.2424 528.5394 3328.1577 21.2810 34.3681 58.193 7.274 3058.2744
7000 0.2828 479.2375 2988.6665 21.2197 34.3863 58.163 7.27 3689.9192
8000 0.3232 448.9053 2847.9541 21.0785 34.5149 57.946 7.243 1743.5521
9000 0.3636 433.0859 2823.5620 21.1035 34.4485 58.058 7.257 4932.8379
10000 0.4040 423.8369 2843.9414 21.0105 34.4298 58.089 7.261 3959.4795
11000 0.4444 394.3074 2524.8374 20.9575 34.5178 57.941 7.243 6243.0879
12000 0.4848 385.4673 2595.5920 20.9185 34.4535 58.049 7.256 17321.8613
13000 0.5253 369.9537 2477.9255 20.8475 34.4953 57.979 7.247 2443.6860
14000 0.5657 358.8618 2519.8567 20.7897 34.9016 57.304 7.163 3639.9983
15000 0.6061 343.0577 2395.4692 20.7710 34.3143 58.285 7.286 1816.2738
16000 0.6465 343.8312 2195.5515 20.7428 34.184 58.507 7.313 14709.8760
17000 0.6869 336.7496 2234.2798 20.7590 34.4691 58.023 7.253 6489.5991
18000 0.7273 338.3747 2191.5310 20.6583 34.4634 58.033 7.254 2819.0298
19000 0.7677 324.3280 2071.9238 20.6345 34.4307 58.088 7.261 3877.8486
20000 0.8081 315.1911 2056.7864 20.5710 34.2186 58.448 7.306 3151.9771
21000 0.8485 315.4604 2161.1489 20.5432 34.5086 57.957 7.245 3105.1853
22000 0.8889 324.6304 1950.2999 20.6125 34.2565 58.383 7.298 2055.8921
23000 0.9293 313.9452 1958.0153 20.5900 34.5413 57.902 7.238 4405.8896
24000 0.9697 311.3475 1918.9283 20.5405 34.2718 58.357 7.295 11800.9756
24750 1.0 303.2348 1956.3597 20.4700 34.3296 58.259 7.282 15104.0020

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.20.0