yalhessi
/

lemexp-task1-min_symbols_template_small-deepseek-coder-1.3b-base-ddp

Generated from Trainer

Model card Files Files and versions

lemexp-task1-min_symbols_template_small-deepseek-coder-1.3b-base-ddp / README.md

yalhessi's picture

End of training

f6d8ce8 verified 6 months ago

|

3.18 kB

	---
	library_name: peft
	license: other
	base_model: deepseek-ai/deepseek-coder-1.3b-base
	tags:
	- generated_from_trainer
	model-index:
	- name: lemexp-task1-min_symbols_template_small-deepseek-coder-1.3b-base-ddp
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# lemexp-task1-min_symbols_template_small-deepseek-coder-1.3b-base-ddp

	This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.1887

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- total_train_batch_size: 16
	- total_eval_batch_size: 16
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 6
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:-----:\|:---------------:\|
	\| 0.4557 \| 0.2001 \| 629 \| 0.3633 \|
	\| 0.3618 \| 0.4001 \| 1258 \| 0.3296 \|
	\| 0.3388 \| 0.6002 \| 1887 \| 0.3132 \|
	\| 0.3131 \| 0.8003 \| 2516 \| 0.2975 \|
	\| 0.3022 \| 1.0003 \| 3145 \| 0.2878 \|
	\| 0.2884 \| 1.2004 \| 3774 \| 0.2849 \|
	\| 0.2806 \| 1.4004 \| 4403 \| 0.2791 \|
	\| 0.2695 \| 1.6005 \| 5032 \| 0.2651 \|
	\| 0.2684 \| 1.8006 \| 5661 \| 0.2560 \|
	\| 0.261 \| 2.0006 \| 6290 \| 0.2564 \|
	\| 0.2544 \| 2.2007 \| 6919 \| 0.2513 \|
	\| 0.2437 \| 2.4008 \| 7548 \| 0.2441 \|
	\| 0.2393 \| 2.6008 \| 8177 \| 0.2406 \|
	\| 0.2375 \| 2.8009 \| 8806 \| 0.2338 \|
	\| 0.2326 \| 3.0010 \| 9435 \| 0.2257 \|
	\| 0.2124 \| 3.2010 \| 10064 \| 0.2227 \|
	\| 0.2137 \| 3.4011 \| 10693 \| 0.2215 \|
	\| 0.2102 \| 3.6011 \| 11322 \| 0.2127 \|
	\| 0.2079 \| 3.8012 \| 11951 \| 0.2103 \|
	\| 0.2034 \| 4.0013 \| 12580 \| 0.2070 \|
	\| 0.1862 \| 4.2013 \| 13209 \| 0.2049 \|
	\| 0.1831 \| 4.4014 \| 13838 \| 0.2029 \|
	\| 0.185 \| 4.6015 \| 14467 \| 0.1987 \|
	\| 0.1754 \| 4.8015 \| 15096 \| 0.1975 \|
	\| 0.1753 \| 5.0016 \| 15725 \| 0.1937 \|
	\| 0.1622 \| 5.2017 \| 16354 \| 0.1959 \|
	\| 0.155 \| 5.4017 \| 16983 \| 0.1912 \|
	\| 0.1501 \| 5.6018 \| 17612 \| 0.1897 \|
	\| 0.1481 \| 5.8018 \| 18241 \| 0.1887 \|


	### Framework versions

	- PEFT 0.14.0
	- Transformers 4.47.0
	- Pytorch 2.5.1+cu124
	- Datasets 3.2.0
	- Tokenizers 0.21.0