lapp0's picture
End of training
59f1e4a verified
|
raw
history blame
2.11 kB
metadata
base_model: gpt2
library_name: distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: gpt2_model_card_distily_test
    results: []

gpt2_model_card_distily_test

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 3251.3369
  • eval_frwikippl: 12842.3994
  • eval_zhwikippl: 91987.7734
  • eval_loss: 2288.0
  • eval_runtime: 0.0553
  • eval_samples_per_second: 18.087
  • eval_steps_per_second: 18.087

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_strategy: logits_activations
  • loss_fn: reverse_kl
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 1.2452GB

Model Results

epoch eval_enwikippl eval_frwikippl eval_loss eval_runtime eval_samples_per_second eval_steps_per_second eval_zhwikippl step
0 58331.5781 58190.1172 6944.0 0.0763 13.107 13.107 54568.5117 0
0.5025 2778.4973 13039.9355 2080.0 0.0561 17.833 17.833 100748.5312 100
0.7538 2581.9565 12580.9199 2048.0 0.0551 18.153 18.153 110134.0156 150
0.2513 3251.3369 12842.3994 2288.0 0.0553 18.087 18.087 91987.7734 50

Framework versions

  • Distily 0.1.0
  • Transformers 4.43.3
  • Pytorch 2.3.0
  • Datasets 2.20.0