lapp0's picture
End of training
1bd9938 verified
|
raw
history blame
2.11 kB
metadata
base_model: gpt2
library_name: distily
license: mit
tags:
  - Distily
  - generated_from_trainer
model-index:
  - name: gpt2_model_card_distily_test
    results: []

gpt2_model_card_distily_test

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • train_loss: 2109.4855

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_strategy: logits_activations
  • loss_fn: reverse_kl
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • num_epochs: 1.0

Model Results

epoch eval_enwikippl eval_frwikippl eval_loss eval_runtime eval_samples_per_second eval_steps_per_second eval_zhwikippl step train_loss
0 61518.3633 57357.1172 7104.0 0.1065 9.388 9.388 60678.2734 0
0.2002002002002002 1984.4683 9672.7939 2192.0 0.0547 18.295 18.295 121910.375 200
0.4004004004004004 1589.3818 7626.9956 2048.0 0.0545 18.334 18.334 74891.5859 400
0.6006006006006006 1461.5446 7612.6294 1968.0 0.0554 18.063 18.063 75592.3516 600
0.8008008008008008 1401.9131 7065.2969 1960.0 0.0547 18.283 18.283 59395.5664 800
2109.4855

Framework versions

  • Distily 0.1.0
  • Transformers 4.43.3
  • Pytorch 2.3.0
  • Datasets 2.20.0