lapp0 commited on
Commit
40ac416
·
verified ·
1 Parent(s): 2f544aa

End of training

Browse files
README.md CHANGED
@@ -16,13 +16,13 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
16
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
18
  It achieves the following results on the evaluation set:
19
- - eval_enwikippl: 218.2680
20
- - eval_frwikippl: 1270.0094
21
- - eval_zhwikippl: 675.2138
22
- - eval_loss: 1.2941
23
- - eval_runtime: 34.9208
24
- - eval_samples_per_second: 57.272
25
- - eval_steps_per_second: 7.159
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
@@ -45,7 +45,7 @@ More information needed
45
  ### Training hyperparameters
46
 
47
  The following hyperparameters were used during training:
48
- - distillation_objective: MultiObjective(logits_weight=1, logits_loss_fn=(fn:kl_divergence_loss()), activations_weight=0, activations_loss_fn=(fn:soft_mse_loss()), attentions_weight=0, attentions_loss_fn=(fn:soft_mse_loss()))
49
  - train_embeddings: True
50
  - learning_rate: 4e-05
51
  - train_batch_size: 8
@@ -56,38 +56,38 @@ The following hyperparameters were used during training:
56
  - num_epochs: 1.0
57
 
58
  ### Resource Usage
59
- Peak GPU Memory: 7.9371 GB
60
 
61
  ### Eval-Phase Metrics
62
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
63
  | --- | --- | --- | --- | --- | --- | --- | --- | --- |
64
  | **teacher eval** | | 30.2086 | 57.2728 | | | | | 18.1784 |
65
- | 0 | 0 | 55232.0742 | 57228.8242 | 5.9266 | 34.236 | 58.418 | 7.302 | 60115.7617 |
66
- | 1000 | 0.0404 | 739.0357 | 4655.4023 | 2.0050 | 33.9843 | 58.851 | 7.356 | 16372.7070 |
67
- | 2000 | 0.0808 | 532.8664 | 3281.5566 | 1.8149 | 34.1449 | 58.574 | 7.322 | 2016.1978 |
68
- | 3000 | 0.1212 | 440.3088 | 2713.1023 | 1.7089 | 34.0465 | 58.743 | 7.343 | 1149.1477 |
69
- | 4000 | 0.1616 | 383.6159 | 2350.9609 | 1.6207 | 34.0823 | 58.682 | 7.335 | 1149.4546 |
70
- | 5000 | 0.2020 | 338.9797 | 1973.5388 | 1.5384 | 34.0344 | 58.764 | 7.345 | 964.7234 |
71
- | 6000 | 0.2424 | 294.0521 | 1662.3237 | 1.4628 | 34.0352 | 58.763 | 7.345 | 726.0898 |
72
- | 7000 | 0.2828 | 259.6556 | 1392.8989 | 1.3990 | 34.6694 | 57.688 | 7.211 | 985.8208 |
73
- | 8000 | 0.3232 | 237.2520 | 1358.5637 | 1.3424 | 34.7511 | 57.552 | 7.194 | 601.8773 |
74
- | 9000 | 0.3636 | 218.2680 | 1270.0094 | 1.2941 | 34.9208 | 57.272 | 7.159 | 675.2138 |
75
- | 10000 | 0.4040 | 199.4044 | 1168.2943 | 1.2452 | 34.5786 | 57.839 | 7.23 | 576.9305 |
76
- | 11000 | 0.4444 | 183.3348 | 1062.2205 | 1.1946 | 34.6379 | 57.74 | 7.218 | 719.8147 |
77
- | 12000 | 0.4848 | 171.0789 | 952.9263 | 1.1565 | 34.6369 | 57.742 | 7.218 | 629.7498 |
78
- | 13000 | 0.5253 | 159.3822 | 874.8159 | 1.1182 | 34.6139 | 57.78 | 7.223 | 845.4815 |
79
- | 14000 | 0.5657 | 152.7777 | 857.2919 | 1.0932 | 33.9072 | 58.984 | 7.373 | 728.5178 |
80
- | 15000 | 0.6061 | 145.9134 | 775.5083 | 1.0677 | 33.9606 | 58.892 | 7.361 | 552.7966 |
81
- | 16000 | 0.6465 | 139.7585 | 770.7659 | 1.0513 | 33.9337 | 58.938 | 7.367 | 511.8709 |
82
- | 17000 | 0.6869 | 136.6782 | 720.5764 | 1.0339 | 33.9201 | 58.962 | 7.37 | 610.5389 |
83
- | 18000 | 0.7273 | 132.8999 | 709.8857 | 1.0204 | 33.9379 | 58.931 | 7.366 | 323.7285 |
84
- | 19000 | 0.7677 | 130.9128 | 720.5255 | 1.0129 | 33.9104 | 58.979 | 7.372 | 413.1778 |
85
- | 20000 | 0.8081 | 131.1570 | 715.4633 | 1.0027 | 33.9308 | 58.943 | 7.368 | 440.1761 |
86
- | 21000 | 0.8485 | 125.3809 | 670.0075 | 0.9936 | 33.8473 | 59.089 | 7.386 | 480.4752 |
87
- | 22000 | 0.8889 | 126.8006 | 634.7371 | 0.9833 | 33.8416 | 59.099 | 7.387 | 304.7259 |
88
- | 23000 | 0.9293 | 124.9435 | 607.9310 | 0.9792 | 33.8527 | 59.079 | 7.385 | 334.6737 |
89
- | 24000 | 0.9697 | 121.6785 | 594.7874 | 0.9726 | 33.9592 | 58.894 | 7.362 | 365.3625 |
90
- | 24750 | 1.0 | 122.7032 | 590.0676 | 0.9684 | 33.9498 | 58.91 | 7.364 | 327.5989 |
91
 
92
  ### Framework versions
93
  - Distily 0.2.0
 
16
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
18
  It achieves the following results on the evaluation set:
19
+ - eval_enwikippl: 210.9362
20
+ - eval_frwikippl: 1217.4034
21
+ - eval_zhwikippl: 592.0721
22
+ - eval_loss: 1.2659
23
+ - eval_runtime: 34.5759
24
+ - eval_samples_per_second: 57.844
25
+ - eval_steps_per_second: 7.23
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
 
45
  ### Training hyperparameters
46
 
47
  The following hyperparameters were used during training:
48
+ - distillation_objective: MultiObjective(logits_weight=1, logits_loss_fn=(fn:kl_divergence_loss()), activations_weight=0.1, activations_loss_fn=(fn:soft_mse_loss()), attentions_weight=0, attentions_loss_fn=(fn:soft_mse_loss()))
49
  - train_embeddings: True
50
  - learning_rate: 4e-05
51
  - train_batch_size: 8
 
56
  - num_epochs: 1.0
57
 
58
  ### Resource Usage
59
+ Peak GPU Memory: 8.0873 GB
60
 
61
  ### Eval-Phase Metrics
62
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
63
  | --- | --- | --- | --- | --- | --- | --- | --- | --- |
64
  | **teacher eval** | | 30.2086 | 57.2728 | | | | | 18.1784 |
65
+ | 0 | 0 | 54069.2930 | 57285.3438 | 5.9282 | 34.5126 | 57.95 | 7.244 | 54227.1016 |
66
+ | 1000 | 0.0404 | 716.9899 | 4690.9888 | 1.9692 | 34.4718 | 58.019 | 7.252 | 17110.3438 |
67
+ | 2000 | 0.0808 | 512.3346 | 3225.1304 | 1.7815 | 34.3496 | 58.225 | 7.278 | 1836.7620 |
68
+ | 3000 | 0.1212 | 424.6275 | 2762.1277 | 1.6690 | 34.3928 | 58.152 | 7.269 | 1142.4155 |
69
+ | 4000 | 0.1616 | 370.8741 | 2383.6748 | 1.5817 | 35.4649 | 56.394 | 7.049 | 856.8477 |
70
+ | 5000 | 0.2020 | 320.8709 | 1872.4182 | 1.5010 | 34.9089 | 57.292 | 7.161 | 899.5260 |
71
+ | 6000 | 0.2424 | 280.1215 | 1633.2791 | 1.4264 | 34.4484 | 58.058 | 7.257 | 1132.9962 |
72
+ | 7000 | 0.2828 | 256.0913 | 1481.0148 | 1.3681 | 34.4301 | 58.089 | 7.261 | 1089.3762 |
73
+ | 8000 | 0.3232 | 231.1957 | 1283.1497 | 1.3140 | 34.4149 | 58.114 | 7.264 | 868.5985 |
74
+ | 9000 | 0.3636 | 210.9362 | 1217.4034 | 1.2659 | 34.5759 | 57.844 | 7.23 | 592.0721 |
75
+ | 10000 | 0.4040 | 197.0646 | 1170.2731 | 1.2188 | 34.4158 | 58.113 | 7.264 | 508.8722 |
76
+ | 11000 | 0.4444 | 177.9348 | 1026.4390 | 1.1678 | 34.4006 | 58.139 | 7.267 | 657.0681 |
77
+ | 12000 | 0.4848 | 164.4871 | 933.5081 | 1.1267 | 34.4416 | 58.069 | 7.259 | 575.0845 |
78
+ | 13000 | 0.5253 | 153.7298 | 846.4210 | 1.0928 | 34.2985 | 58.312 | 7.289 | 558.6592 |
79
+ | 14000 | 0.5657 | 146.5948 | 780.8855 | 1.0625 | 34.3668 | 58.196 | 7.274 | 541.9786 |
80
+ | 15000 | 0.6061 | 141.3081 | 772.5070 | 1.0448 | 34.7524 | 57.55 | 7.194 | 660.9402 |
81
+ | 16000 | 0.6465 | 141.5167 | 705.8929 | 1.0329 | 35.1307 | 56.93 | 7.116 | 543.9362 |
82
+ | 17000 | 0.6869 | 135.4525 | 729.8825 | 1.0154 | 34.3922 | 58.153 | 7.269 | 460.1301 |
83
+ | 18000 | 0.7273 | 133.1375 | 703.1610 | 1.0063 | 34.6749 | 57.679 | 7.21 | 546.2656 |
84
+ | 19000 | 0.7677 | 129.7590 | 700.7360 | 0.9923 | 34.4165 | 58.112 | 7.264 | 359.0746 |
85
+ | 20000 | 0.8081 | 128.2863 | 660.0209 | 0.9843 | 34.3672 | 58.195 | 7.274 | 371.1157 |
86
+ | 21000 | 0.8485 | 125.8589 | 636.8441 | 0.9743 | 34.4249 | 58.097 | 7.262 | 532.6509 |
87
+ | 22000 | 0.8889 | 125.1572 | 607.6739 | 0.9691 | 34.3037 | 58.303 | 7.288 | 386.2334 |
88
+ | 23000 | 0.9293 | 123.7366 | 640.1755 | 0.9637 | 34.4018 | 58.137 | 7.267 | 350.3125 |
89
+ | 24000 | 0.9697 | 120.8405 | 591.9428 | 0.9533 | 34.3506 | 58.223 | 7.278 | 328.9578 |
90
+ | 24750 | 1.0 | 120.9155 | 655.1063 | 0.9532 | 35.0356 | 57.085 | 7.136 | 336.4213 |
91
 
92
  ### Framework versions
93
  - Distily 0.2.0
logs/distillation_objective=MultiObjective(logits_weight_1__logits_loss_fn_(fn_kl_divergence_loss())__activations_weight_0.1__activations_loss_fn_(fn_soft_mse_loss())__attentions_weight_0__attentions_loss_/events.out.tfevents.1723485026.93d6cbb3ad53 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f31b6f8f8d8b1f8f6dac7dd47c0a4902d9f04bfa2133b4adad48758ffd1ff78c
3
+ size 253