gpt2_model_card_distily_test / README.md

lapp0

End of training

37ed7e9 verified about 1 year ago

preview code

raw

history blame

1.96 kB

metadata

base_model: gpt2
library_name: distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: gpt2_model_card_distily_test
    results: []

gpt2_model_card_distily_test

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

eval_enwikippl: 30.2266
eval_frwikippl: 57.3005
eval_zhwikippl: 18.1903

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

distillation_strategy: logits_activations
loss_fn: reverse_kl
train_embeddings: True
learning_rate: 0.0001
train_batch_size: 1
eval_batch_size: 2
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
num_epochs: 1.0

Resource Usage

Peak GPU Memory: 1.2453 GB

Model Results

epoch	step	eval_enwikippl	eval_frwikippl	eval_loss	eval_runtime	eval_samples_per_second	eval_steps_per_second	eval_zhwikippl
	teacher	30.2266	57.3005					18.1903
0	0	53288.7773	55702.1719	0.0041	0.0758	13.185	13.185	55025.875
0.4040	40	20265.3535	39300.7383	0.0004	0.0554	18.059	18.059	53151.6875
0.8081	80	17527.1328	38131.125	0.0004	0.0553	18.096	18.096	51728.4688

Framework versions

Distily 0.1.0
Transformers 4.43.3
Pytorch 2.3.0
Datasets 2.20.0