End of training

fbdc2ff verified 5 months ago

4.84 kB

	---
	library_name: peft
	license: other
	base_model: deepseek-ai/deepseek-coder-1.3b-base
	tags:
	- generated_from_trainer
	model-index:
	- name: lemexp-task1-v2-lemma_object_small-deepseek-coder-1.3b-base-ddp-8lr-v2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# lemexp-task1-v2-lemma_object_small-deepseek-coder-1.3b-base-ddp-8lr-v2

	This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.2352

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0008
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- total_train_batch_size: 16
	- total_eval_batch_size: 16
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 12
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-------:\|:-----:\|:---------------:\|
	\| 0.588 \| 0.2002 \| 721 \| 0.4356 \|
	\| 0.4481 \| 0.4003 \| 1442 \| 0.4067 \|
	\| 0.397 \| 0.6005 \| 2163 \| 0.3795 \|
	\| 0.3841 \| 0.8007 \| 2884 \| 0.3673 \|
	\| 0.3666 \| 1.0008 \| 3605 \| 0.3445 \|
	\| 0.3409 \| 1.2010 \| 4326 \| 0.3416 \|
	\| 0.3347 \| 1.4012 \| 5047 \| 0.3323 \|
	\| 0.3285 \| 1.6013 \| 5768 \| 0.3313 \|
	\| 0.3288 \| 1.8015 \| 6489 \| 0.3217 \|
	\| 0.3232 \| 2.0017 \| 7210 \| 0.3111 \|
	\| 0.3048 \| 2.2018 \| 7931 \| 0.3151 \|
	\| 0.3004 \| 2.4020 \| 8652 \| 0.3155 \|
	\| 0.2994 \| 2.6022 \| 9373 \| 0.3052 \|
	\| 0.2939 \| 2.8023 \| 10094 \| 0.3023 \|
	\| 0.2948 \| 3.0025 \| 10815 \| 0.3001 \|
	\| 0.2702 \| 3.2027 \| 11536 \| 0.2995 \|
	\| 0.2707 \| 3.4028 \| 12257 \| 0.2913 \|
	\| 0.2696 \| 3.6030 \| 12978 \| 0.2870 \|
	\| 0.2729 \| 3.8032 \| 13699 \| 0.2859 \|
	\| 0.2697 \| 4.0033 \| 14420 \| 0.2853 \|
	\| 0.2497 \| 4.2035 \| 15141 \| 0.2815 \|
	\| 0.2497 \| 4.4037 \| 15862 \| 0.2768 \|
	\| 0.2503 \| 4.6038 \| 16583 \| 0.2748 \|
	\| 0.2505 \| 4.8040 \| 17304 \| 0.2728 \|
	\| 0.2491 \| 5.0042 \| 18025 \| 0.2706 \|
	\| 0.2254 \| 5.2043 \| 18746 \| 0.2711 \|
	\| 0.2274 \| 5.4045 \| 19467 \| 0.2669 \|
	\| 0.2283 \| 5.6047 \| 20188 \| 0.2661 \|
	\| 0.2294 \| 5.8048 \| 20909 \| 0.2616 \|
	\| 0.2304 \| 6.0050 \| 21630 \| 0.2598 \|
	\| 0.2099 \| 6.2052 \| 22351 \| 0.2619 \|
	\| 0.2101 \| 6.4053 \| 23072 \| 0.2603 \|
	\| 0.2098 \| 6.6055 \| 23793 \| 0.2579 \|
	\| 0.2105 \| 6.8057 \| 24514 \| 0.2530 \|
	\| 0.2085 \| 7.0058 \| 25235 \| 0.2564 \|
	\| 0.1949 \| 7.2060 \| 25956 \| 0.2512 \|
	\| 0.1914 \| 7.4062 \| 26677 \| 0.2459 \|
	\| 0.1901 \| 7.6063 \| 27398 \| 0.2432 \|
	\| 0.1892 \| 7.8065 \| 28119 \| 0.2481 \|
	\| 0.1893 \| 8.0067 \| 28840 \| 0.2445 \|
	\| 0.1676 \| 8.2068 \| 29561 \| 0.2409 \|
	\| 0.1706 \| 8.4070 \| 30282 \| 0.2376 \|
	\| 0.1713 \| 8.6072 \| 31003 \| 0.2342 \|
	\| 0.1711 \| 8.8073 \| 31724 \| 0.2323 \|
	\| 0.1691 \| 9.0075 \| 32445 \| 0.2403 \|
	\| 0.1485 \| 9.2077 \| 33166 \| 0.2380 \|
	\| 0.1483 \| 9.4078 \| 33887 \| 0.2373 \|
	\| 0.152 \| 9.6080 \| 34608 \| 0.2343 \|
	\| 0.1523 \| 9.8082 \| 35329 \| 0.2318 \|
	\| 0.1505 \| 10.0083 \| 36050 \| 0.2356 \|
	\| 0.1311 \| 10.2085 \| 36771 \| 0.2393 \|
	\| 0.1316 \| 10.4087 \| 37492 \| 0.2324 \|
	\| 0.1324 \| 10.6088 \| 38213 \| 0.2310 \|
	\| 0.1334 \| 10.8090 \| 38934 \| 0.2324 \|
	\| 0.1305 \| 11.0092 \| 39655 \| 0.2367 \|
	\| 0.1194 \| 11.2093 \| 40376 \| 0.2374 \|
	\| 0.1164 \| 11.4095 \| 41097 \| 0.2376 \|
	\| 0.1181 \| 11.6097 \| 41818 \| 0.2369 \|
	\| 0.117 \| 11.8098 \| 42539 \| 0.2352 \|


	### Framework versions

	- PEFT 0.14.0
	- Transformers 4.47.0
	- Pytorch 2.5.1+cu124
	- Datasets 3.2.0
	- Tokenizers 0.21.0

	---
	library_name: peft
	license: other
	base_model: deepseek-ai/deepseek-coder-1.3b-base
	tags:
	- generated_from_trainer
	model-index:
	- name: lemexp-task1-v2-lemma_object_small-deepseek-coder-1.3b-base-ddp-8lr-v2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# lemexp-task1-v2-lemma_object_small-deepseek-coder-1.3b-base-ddp-8lr-v2

	This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.2352

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0008
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- total_train_batch_size: 16
	- total_eval_batch_size: 16
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 12
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-------:\|:-----:\|:---------------:\|
	\| 0.588 \| 0.2002 \| 721 \| 0.4356 \|
	\| 0.4481 \| 0.4003 \| 1442 \| 0.4067 \|
	\| 0.397 \| 0.6005 \| 2163 \| 0.3795 \|
	\| 0.3841 \| 0.8007 \| 2884 \| 0.3673 \|
	\| 0.3666 \| 1.0008 \| 3605 \| 0.3445 \|
	\| 0.3409 \| 1.2010 \| 4326 \| 0.3416 \|
	\| 0.3347 \| 1.4012 \| 5047 \| 0.3323 \|
	\| 0.3285 \| 1.6013 \| 5768 \| 0.3313 \|
	\| 0.3288 \| 1.8015 \| 6489 \| 0.3217 \|
	\| 0.3232 \| 2.0017 \| 7210 \| 0.3111 \|
	\| 0.3048 \| 2.2018 \| 7931 \| 0.3151 \|
	\| 0.3004 \| 2.4020 \| 8652 \| 0.3155 \|
	\| 0.2994 \| 2.6022 \| 9373 \| 0.3052 \|
	\| 0.2939 \| 2.8023 \| 10094 \| 0.3023 \|
	\| 0.2948 \| 3.0025 \| 10815 \| 0.3001 \|
	\| 0.2702 \| 3.2027 \| 11536 \| 0.2995 \|
	\| 0.2707 \| 3.4028 \| 12257 \| 0.2913 \|
	\| 0.2696 \| 3.6030 \| 12978 \| 0.2870 \|
	\| 0.2729 \| 3.8032 \| 13699 \| 0.2859 \|
	\| 0.2697 \| 4.0033 \| 14420 \| 0.2853 \|
	\| 0.2497 \| 4.2035 \| 15141 \| 0.2815 \|
	\| 0.2497 \| 4.4037 \| 15862 \| 0.2768 \|
	\| 0.2503 \| 4.6038 \| 16583 \| 0.2748 \|
	\| 0.2505 \| 4.8040 \| 17304 \| 0.2728 \|
	\| 0.2491 \| 5.0042 \| 18025 \| 0.2706 \|
	\| 0.2254 \| 5.2043 \| 18746 \| 0.2711 \|
	\| 0.2274 \| 5.4045 \| 19467 \| 0.2669 \|
	\| 0.2283 \| 5.6047 \| 20188 \| 0.2661 \|
	\| 0.2294 \| 5.8048 \| 20909 \| 0.2616 \|
	\| 0.2304 \| 6.0050 \| 21630 \| 0.2598 \|
	\| 0.2099 \| 6.2052 \| 22351 \| 0.2619 \|
	\| 0.2101 \| 6.4053 \| 23072 \| 0.2603 \|
	\| 0.2098 \| 6.6055 \| 23793 \| 0.2579 \|
	\| 0.2105 \| 6.8057 \| 24514 \| 0.2530 \|
	\| 0.2085 \| 7.0058 \| 25235 \| 0.2564 \|
	\| 0.1949 \| 7.2060 \| 25956 \| 0.2512 \|
	\| 0.1914 \| 7.4062 \| 26677 \| 0.2459 \|
	\| 0.1901 \| 7.6063 \| 27398 \| 0.2432 \|
	\| 0.1892 \| 7.8065 \| 28119 \| 0.2481 \|
	\| 0.1893 \| 8.0067 \| 28840 \| 0.2445 \|
	\| 0.1676 \| 8.2068 \| 29561 \| 0.2409 \|
	\| 0.1706 \| 8.4070 \| 30282 \| 0.2376 \|
	\| 0.1713 \| 8.6072 \| 31003 \| 0.2342 \|
	\| 0.1711 \| 8.8073 \| 31724 \| 0.2323 \|
	\| 0.1691 \| 9.0075 \| 32445 \| 0.2403 \|
	\| 0.1485 \| 9.2077 \| 33166 \| 0.2380 \|
	\| 0.1483 \| 9.4078 \| 33887 \| 0.2373 \|
	\| 0.152 \| 9.6080 \| 34608 \| 0.2343 \|
	\| 0.1523 \| 9.8082 \| 35329 \| 0.2318 \|
	\| 0.1505 \| 10.0083 \| 36050 \| 0.2356 \|
	\| 0.1311 \| 10.2085 \| 36771 \| 0.2393 \|
	\| 0.1316 \| 10.4087 \| 37492 \| 0.2324 \|
	\| 0.1324 \| 10.6088 \| 38213 \| 0.2310 \|
	\| 0.1334 \| 10.8090 \| 38934 \| 0.2324 \|
	\| 0.1305 \| 11.0092 \| 39655 \| 0.2367 \|
	\| 0.1194 \| 11.2093 \| 40376 \| 0.2374 \|
	\| 0.1164 \| 11.4095 \| 41097 \| 0.2376 \|
	\| 0.1181 \| 11.6097 \| 41818 \| 0.2369 \|
	\| 0.117 \| 11.8098 \| 42539 \| 0.2352 \|


	### Framework versions

	- PEFT 0.14.0
	- Transformers 4.47.0
	- Pytorch 2.5.1+cu124
	- Datasets 3.2.0
	- Tokenizers 0.21.0