End of training

b2c2f51 verified about 1 year ago

4.23 kB

metadata

base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: istily_bench_gpt2_simple_objectives
    results: []

distily_bench_gpt2_simple_objectives

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

eval_enwikippl: 213.1260
eval_frwikippl: 1238.3538
eval_zhwikippl: 689.7033
eval_loss: 1.2684
eval_runtime: 33.9389
eval_samples_per_second: 58.929
eval_steps_per_second: 7.366

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

distillation_objective: MultiObjective(logits_weight=1, logits_loss_fn=(fn:kl_divergence_loss()), activations_weight=0, activations_loss_fn=(fn:mse_loss()), attentions_weight=0, attentions_loss_fn=(fn:mse_loss()))
train_embeddings: True
learning_rate: 4e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant
num_epochs: 1.0

Resource Usage

Peak GPU Memory: 7.9371 GB

Eval-Phase Metrics

step	epoch	enwikippl	frwikippl	loss	runtime	samples_per_second	steps_per_second	zhwikippl
teacher eval		30.2086	57.2728					18.1784
0	0	57983.2695	56826.7539	5.9504	33.9223	58.958	7.37	51544.0508
1000	0.0404	716.3218	4663.2852	1.9522	34.1014	58.649	7.331	17271.0391
2000	0.0808	512.1357	3224.2202	1.7690	34.1187	58.619	7.327	2109.2849
3000	0.1212	418.9938	2658.5667	1.6652	34.1292	58.601	7.325	1129.3704
4000	0.1616	367.4342	2491.9417	1.5763	34.0919	58.665	7.333	798.7274
5000	0.2020	317.3523	1897.4025	1.4963	33.965	58.884	7.361	962.9218
6000	0.2424	282.9857	1585.8464	1.4222	33.9768	58.864	7.358	852.0554
7000	0.2828	251.4994	1421.8730	1.3623	33.9388	58.93	7.366	753.7527
8000	0.3232	229.7460	1314.6521	1.3137	34.0289	58.773	7.347	729.5888
9000	0.3636	213.1260	1238.3538	1.2684	33.9389	58.929	7.366	689.7033
10000	0.4040	197.5243	1147.7201	1.2172	34.1028	58.646	7.331	761.6445
11000	0.4444	178.5023	1065.9717	1.1681	34.111	58.632	7.329	697.0179
12000	0.4848	164.3850	941.9713	1.1267	34.1042	58.644	7.33	722.8970
13000	0.5253	157.2920	871.0618	1.0965	34.1353	58.59	7.324	484.9227
14000	0.5657	150.8093	806.3426	1.0674	34.0619	58.717	7.34	539.5954
15000	0.6061	143.2526	816.5259	1.0499	34.2668	58.366	7.296	509.8925
16000	0.6465	139.8671	715.0598	1.0314	34.0375	58.759	7.345	426.2927
17000	0.6869	134.8648	739.3088	1.0151	34.0663	58.709	7.339	458.1682
18000	0.7273	132.5907	675.8909	1.0007	33.9807	58.857	7.357	348.7257
19000	0.7677	129.5074	665.1128	0.9937	34.017	58.794	7.349	350.5464
20000	0.8081	127.9778	683.8963	0.9837	33.9292	58.946	7.368	395.9997
21000	0.8485	125.7319	659.5090	0.9754	33.985	58.849	7.356	518.3367
22000	0.8889	124.8950	691.0702	0.9696	34.2015	58.477	7.31	610.1314
23000	0.9293	123.7751	644.4776	0.9625	34.1656	58.538	7.317	321.7459
24000	0.9697	122.1613	658.5797	0.9586	33.975	58.867	7.358	353.6970
24750	1.0	119.9802	652.2029	0.9537	34.2146	58.455	7.307	339.4447

Framework versions

Distily 0.2.0
Transformers 4.44.0
Pytorch 2.3.0
Datasets 2.20.0