gpt2_model_card_distily_test / README.md

lapp0

End of training

1ee9733 verified about 1 year ago

preview code

raw

history blame

2.12 kB

metadata

base_model: gpt2
library_name: distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: gpt2_model_card_distily_test
    results: []

gpt2_model_card_distily_test

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

eval_enwikippl: 3305.2227
eval_frwikippl: 13155.6016
eval_zhwikippl: 73644.4141
eval_loss: 2480.0
eval_runtime: 0.0549
eval_samples_per_second: 18.218
eval_steps_per_second: 18.218

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

distillation_strategy: logits_activations
loss_fn: reverse_kl
train_embeddings: True
learning_rate: 0.0001
train_batch_size: 1
eval_batch_size: 2
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
num_epochs: 1.0

Resource Usage

Peak GPU Memory: 1.245255947113037GB

Model Results

epoch	eval_enwikippl	eval_frwikippl	eval_loss	eval_runtime	eval_samples_per_second	eval_steps_per_second	eval_zhwikippl	step
0	58716.1836	59308.4531	6848.0	0.078	12.815	12.815	56780.0039	0
0.5025	2729.5120	12198.3467	2288.0	0.0559	17.885	17.885	86770.4375	100
0.7538	2523.9104	11865.6045	2240.0	0.0557	17.969	17.969	91730.1328	150
0.2513	3305.2227	13155.6016	2480.0	0.0549	18.218	18.218	73644.4141	50

Framework versions

Distily 0.1.0
Transformers 4.43.3
Pytorch 2.3.0
Datasets 2.20.0