yalhessi
/

lemexp-task1-min_symbols_template_small-deepseek-coder-1.3b-base-ddp

Generated from Trainer

Model card Files Files and versions

lemexp-task1-min_symbols_template_small-deepseek-coder-1.3b-base-ddp / README.md

yalhessi's picture

End of training

dee96c4 verified 6 months ago

|

3.18 kB

	---
	library_name: peft
	license: other
	base_model: deepseek-ai/deepseek-coder-1.3b-base
	tags:
	- generated_from_trainer
	model-index:
	- name: lemexp-task1-min_symbols_template_small-deepseek-coder-1.3b-base-ddp
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# lemexp-task1-min_symbols_template_small-deepseek-coder-1.3b-base-ddp

	This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.2065

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- total_train_batch_size: 16
	- total_eval_batch_size: 16
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 6
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:-----:\|:---------------:\|
	\| 0.5153 \| 0.2001 \| 629 \| 0.3804 \|
	\| 0.3737 \| 0.4001 \| 1258 \| 0.3328 \|
	\| 0.339 \| 0.6002 \| 1887 \| 0.3059 \|
	\| 0.3011 \| 0.8003 \| 2516 \| 0.2860 \|
	\| 0.2889 \| 1.0003 \| 3145 \| 0.2774 \|
	\| 0.2715 \| 1.2004 \| 3774 \| 0.2685 \|
	\| 0.2638 \| 1.4004 \| 4403 \| 0.2573 \|
	\| 0.2513 \| 1.6005 \| 5032 \| 0.2510 \|
	\| 0.2493 \| 1.8006 \| 5661 \| 0.2448 \|
	\| 0.2416 \| 2.0006 \| 6290 \| 0.2400 \|
	\| 0.2359 \| 2.2007 \| 6919 \| 0.2365 \|
	\| 0.2247 \| 2.4008 \| 7548 \| 0.2334 \|
	\| 0.2204 \| 2.6008 \| 8177 \| 0.2292 \|
	\| 0.2208 \| 2.8009 \| 8806 \| 0.2235 \|
	\| 0.2157 \| 3.0010 \| 9435 \| 0.2226 \|
	\| 0.1976 \| 3.2010 \| 10064 \| 0.2208 \|
	\| 0.1991 \| 3.4011 \| 10693 \| 0.2209 \|
	\| 0.1982 \| 3.6011 \| 11322 \| 0.2157 \|
	\| 0.1977 \| 3.8012 \| 11951 \| 0.2140 \|
	\| 0.1949 \| 4.0013 \| 12580 \| 0.2121 \|
	\| 0.1821 \| 4.2013 \| 13209 \| 0.2135 \|
	\| 0.1791 \| 4.4014 \| 13838 \| 0.2106 \|
	\| 0.1829 \| 4.6015 \| 14467 \| 0.2089 \|
	\| 0.177 \| 4.8015 \| 15096 \| 0.2085 \|
	\| 0.1789 \| 5.0016 \| 15725 \| 0.2063 \|
	\| 0.1704 \| 5.2017 \| 16354 \| 0.2083 \|
	\| 0.1667 \| 5.4017 \| 16983 \| 0.2074 \|
	\| 0.1641 \| 5.6018 \| 17612 \| 0.2068 \|
	\| 0.1642 \| 5.8018 \| 18241 \| 0.2065 \|


	### Framework versions

	- PEFT 0.14.0
	- Transformers 4.47.0
	- Pytorch 2.5.1+cu124
	- Datasets 3.2.0
	- Tokenizers 0.21.0