lapp0's picture
End of training
ad1087f verified
|
raw
history blame
3.17 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2.12b_gpt2
    results: []

distily_bench_obj_cross_v2.12b_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 249.0
  • eval_frwikippl: 600.0
  • eval_zhwikippl: 186.0
  • eval_tinystoriesppl: 220.0
  • eval_loss: 0.9819
  • eval_runtime: 12.7319
  • eval_samples_per_second: 47.126
  • eval_steps_per_second: 11.781

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.5
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 4.1856 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 43.75 61.75 11.8125 19.125
0 0 837518622720.0 78065325572096.0 19.8108 12.6525 47.421 11.855 2667577344.0 36009005809664.0
1500 0.1010 1472.0 8832.0 2.5979 12.6262 47.52 11.88 1056.0 19200.0
3000 0.2020 500.0 3040.0 1.8976 12.7775 46.958 11.739 354.0 552.0
4500 0.3030 312.0 1320.0 1.5456 12.7017 47.238 11.809 249.0 260.0
6000 0.4040 234.0 940.0 1.3441 12.5854 47.674 11.919 204.0 158.0
7500 0.5051 190.0 656.0 1.1277 12.5936 47.643 11.911 164.0 152.0
9000 0.6061 249.0 600.0 0.9819 12.7319 47.126 11.781 220.0 186.0
10500 0.7071 141.0 436.0 0.8717 12.5874 47.667 11.917 121.0 128.0
12000 0.8081 193.0 482.0 0.8292 12.6439 47.454 11.863 163.0 135.0
13500 0.9091 202.0 504.0 0.8078 12.5913 47.652 11.913 176.0 136.0
14850 1.0 196.0 490.0 0.8045 12.677 47.33 11.832 170.0 135.0

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0