1bd9cceed2dcd10f2ece1070a2e20a3c

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [es-it] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.8072
  • Data Size: 1.0
  • Epoch Runtime: 112.5041
  • Bleu: 4.3082

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 16.6425 0 10.0867 0.2928
No log 1 721 16.3540 0.0078 10.9510 0.2893
No log 2 1442 14.6128 0.0156 11.9953 0.3034
0.3458 3 2163 12.9897 0.0312 14.2539 0.3222
1.0187 4 2884 9.4280 0.0625 17.0721 0.3791
9.7765 5 3605 6.0200 0.125 23.5481 0.4979
6.4556 6 4326 4.5495 0.25 35.6383 1.6521
5.2523 7 5047 3.9789 0.5 61.2487 1.6179
4.508 8.0 5768 3.5314 1.0 112.2478 2.0289
4.1967 9.0 6489 3.3986 1.0 112.5871 2.3105
4.052 10.0 7210 3.3132 1.0 113.8064 2.5055
3.8898 11.0 7931 3.2623 1.0 112.9548 2.6756
3.8333 12.0 8652 3.2131 1.0 113.5531 2.7829
3.7702 13.0 9373 3.1821 1.0 113.0508 2.8880
3.6635 14.0 10094 3.1422 1.0 113.4677 3.0043
3.6578 15.0 10815 3.1133 1.0 113.1681 3.0899
3.5582 16.0 11536 3.0999 1.0 113.1700 3.1533
3.5449 17.0 12257 3.0735 1.0 114.2252 3.2124
3.5093 18.0 12978 3.0548 1.0 112.8411 3.2856
3.4384 19.0 13699 3.0419 1.0 113.2164 3.3314
3.4229 20.0 14420 3.0157 1.0 113.5167 3.3987
3.4119 21.0 15141 3.0014 1.0 113.0884 3.4310
3.3609 22.0 15862 2.9874 1.0 113.3006 3.5151
3.2723 23.0 16583 2.9811 1.0 114.8710 3.5543
3.2748 24.0 17304 2.9645 1.0 114.0400 3.6138
3.2806 25.0 18025 2.9625 1.0 113.1700 3.6308
3.2696 26.0 18746 2.9382 1.0 113.3355 3.6929
3.2254 27.0 19467 2.9330 1.0 112.4022 3.6982
3.2108 28.0 20188 2.9252 1.0 113.1494 3.7675
3.1536 29.0 20909 2.9150 1.0 113.0551 3.8057
3.1271 30.0 21630 2.9039 1.0 113.0676 3.8281
3.1324 31.0 22351 2.9001 1.0 113.4059 3.8688
3.1245 32.0 23072 2.8917 1.0 114.1657 3.9119
3.0853 33.0 23793 2.8821 1.0 113.8688 3.9384
3.025 34.0 24514 2.8809 1.0 113.4756 3.9585
3.0303 35.0 25235 2.8723 1.0 112.4681 3.9852
3.0046 36.0 25956 2.8594 1.0 113.2854 3.9970
2.9943 37.0 26677 2.8579 1.0 113.8893 4.0160
2.9874 38.0 27398 2.8528 1.0 112.7151 4.0269
2.9358 39.0 28119 2.8503 1.0 113.6051 4.0450
2.9332 40.0 28840 2.8432 1.0 112.3515 4.0958
2.9513 41.0 29561 2.8370 1.0 113.1157 4.1324
2.9465 42.0 30282 2.8293 1.0 112.8311 4.1777
2.8816 43.0 31003 2.8295 1.0 114.5790 4.1632
2.8867 44.0 31724 2.8162 1.0 114.1613 4.1918
2.8684 45.0 32445 2.8202 1.0 112.4990 4.2068
2.8588 46.0 33166 2.8130 1.0 113.9478 4.2499
2.817 47.0 33887 2.8068 1.0 113.8720 4.2553
2.8057 48.0 34608 2.8122 1.0 112.9857 4.2949
2.8197 49.0 35329 2.8053 1.0 112.6193 4.3000
2.8217 50.0 36050 2.8072 1.0 112.5041 4.3082

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
3
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/1bd9cceed2dcd10f2ece1070a2e20a3c

Base model

google/umt5-small
Finetuned
(45)
this model

Evaluation results