BluebrainAI
/

parallel-mean-bottleneck-gpt2-medium-wikitext

Feature Extraction

Generated from Trainer

Model card Files Files and versions

parallel-mean-bottleneck-gpt2-medium-wikitext / README.md

shivanandmn's picture

Model save

f5338ca verified 6 months ago

|

2.94 kB

	---
	library_name: transformers
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	- bleu
	model-index:
	- name: parallel-mean-bottleneck-gpt2-medium-wikitext
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# parallel-mean-bottleneck-gpt2-medium-wikitext

	This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 3.1859
	- Accuracy: 0.4194
	- Perplexity: 24.1889
	- Bleu: 0.1461

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 64
	- eval_batch_size: 64
	- seed: 42
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 5

	### Training results

	\| Training Loss \| Epoch \| Step \| Accuracy \| Bleu \| Validation Loss \| Perplexity \|
	\|:-------------:\|:------:\|:----:\|:--------:\|:------:\|:---------------:\|:----------:\|
	\| 6.0432 \| 0.2806 \| 500 \| 0.1909 \| 0.0378 \| 5.9180 \| 371.6605 \|
	\| 5.0476 \| 0.5612 \| 1000 \| 0.2633 \| 0.0612 \| 4.8985 \| 134.0910 \|
	\| 4.3528 \| 0.8418 \| 1500 \| 0.3182 \| 0.0834 \| 4.2398 \| 69.3933 \|
	\| 3.9497 \| 1.1223 \| 2000 \| 0.3520 \| 0.1054 \| 3.8879 \| 48.8078 \|
	\| 3.7614 \| 1.4029 \| 2500 \| 0.3674 \| 0.1207 \| 3.7128 \| 40.9670 \|
	\| 3.6543 \| 1.6835 \| 3000 \| 0.3780 \| 0.1310 \| 3.5902 \| 36.2404 \|
	\| 3.5527 \| 1.9641 \| 3500 \| 0.3864 \| 0.1337 \| 3.5048 \| 33.2757 \|
	\| 3.4348 \| 2.2447 \| 4000 \| 0.3923 \| 0.1361 \| 3.4401 \| 31.1898 \|
	\| 3.3739 \| 2.5253 \| 4500 \| 3.3868 \| 0.3974 \| 29.5718 \| 0.1419 \|
	\| 3.3441 \| 2.8058 \| 5000 \| 3.3419 \| 0.4020 \| 28.2718 \| 0.1394 \|
	\| 3.2252 \| 3.0864 \| 5500 \| 3.3067 \| 0.4057 \| 27.2940 \| 0.1432 \|
	\| 3.2188 \| 3.3670 \| 6000 \| 3.2775 \| 0.4088 \| 26.5107 \| 0.1421 \|
	\| 3.1971 \| 3.6476 \| 6500 \| 3.2502 \| 0.4115 \| 25.7958 \| 0.1426 \|
	\| 3.1722 \| 3.9282 \| 7000 \| 3.2266 \| 0.4143 \| 25.1936 \| 0.1446 \|
	\| 3.1052 \| 4.2088 \| 7500 \| 3.2103 \| 0.4163 \| 24.7864 \| 0.1433 \|
	\| 3.0672 \| 4.4893 \| 8000 \| 3.1967 \| 0.4180 \| 24.4514 \| 0.1438 \|
	\| 3.0774 \| 4.7699 \| 8500 \| 3.1859 \| 0.4194 \| 24.1889 \| 0.1461 \|


	### Framework versions

	- Transformers 4.49.0
	- Pytorch 2.6.0+cu124
	- Datasets 3.3.2
	- Tokenizers 0.21.0