gpt2_model_card_distily_test / README.md

lapp0

End of training

1bd9938 verified over 1 year ago

preview code

raw

history blame

2.11 kB

metadata

base_model: gpt2
library_name: distily
license: mit
tags:
  - Distily
  - generated_from_trainer
model-index:
  - name: gpt2_model_card_distily_test
    results: []

gpt2_model_card_distily_test

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

train_loss: 2109.4855

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

distillation_strategy: logits_activations
loss_fn: reverse_kl
train_embeddings: True
learning_rate: 0.0001
train_batch_size: 1
eval_batch_size: 2
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
num_epochs: 1.0

Model Results

epoch	eval_enwikippl	eval_frwikippl	eval_loss	eval_runtime	eval_samples_per_second	eval_steps_per_second	eval_zhwikippl	step	train_loss
0	61518.3633	57357.1172	7104.0	0.1065	9.388	9.388	60678.2734	0
0.2002002002002002	1984.4683	9672.7939	2192.0	0.0547	18.295	18.295	121910.375	200
0.4004004004004004	1589.3818	7626.9956	2048.0	0.0545	18.334	18.334	74891.5859	400
0.6006006006006006	1461.5446	7612.6294	1968.0	0.0554	18.063	18.063	75592.3516	600
0.8008008008008008	1401.9131	7065.2969	1960.0	0.0547	18.283	18.283	59395.5664	800
									2109.4855

Framework versions

Distily 0.1.0
Transformers 4.43.3
Pytorch 2.3.0
Datasets 2.20.0