lapp0's picture
End of training
37ed7e9 verified
|
raw
history blame
1.96 kB
metadata
base_model: gpt2
library_name: distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: gpt2_model_card_distily_test
    results: []

gpt2_model_card_distily_test

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 30.2266
  • eval_frwikippl: 57.3005
  • eval_zhwikippl: 18.1903

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_strategy: logits_activations
  • loss_fn: reverse_kl
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 1.2453 GB

Model Results

epoch step eval_enwikippl eval_frwikippl eval_loss eval_runtime eval_samples_per_second eval_steps_per_second eval_zhwikippl
teacher 30.2266 57.3005 18.1903
0 0 53288.7773 55702.1719 0.0041 0.0758 13.185 13.185 55025.875
0.4040 40 20265.3535 39300.7383 0.0004 0.0554 18.059 18.059 53151.6875
0.8081 80 17527.1328 38131.125 0.0004 0.0553 18.096 18.096 51728.4688

Framework versions

  • Distily 0.1.0
  • Transformers 4.43.3
  • Pytorch 2.3.0
  • Datasets 2.20.0