lapp0's picture
End of training
1ee9733 verified
|
raw
history blame
2.12 kB
metadata
base_model: gpt2
library_name: distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: gpt2_model_card_distily_test
    results: []

gpt2_model_card_distily_test

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 3305.2227
  • eval_frwikippl: 13155.6016
  • eval_zhwikippl: 73644.4141
  • eval_loss: 2480.0
  • eval_runtime: 0.0549
  • eval_samples_per_second: 18.218
  • eval_steps_per_second: 18.218

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_strategy: logits_activations
  • loss_fn: reverse_kl
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 1.245255947113037GB

Model Results

epoch eval_enwikippl eval_frwikippl eval_loss eval_runtime eval_samples_per_second eval_steps_per_second eval_zhwikippl step
0 58716.1836 59308.4531 6848.0 0.078 12.815 12.815 56780.0039 0
0.5025 2729.5120 12198.3467 2288.0 0.0559 17.885 17.885 86770.4375 100
0.7538 2523.9104 11865.6045 2240.0 0.0557 17.969 17.969 91730.1328 150
0.2513 3305.2227 13155.6016 2480.0 0.0549 18.218 18.218 73644.4141 50

Framework versions

  • Distily 0.1.0
  • Transformers 4.43.3
  • Pytorch 2.3.0
  • Datasets 2.20.0