olm-gpt2-oct-2022 / README.md

fix formatting in table

cfcb367 almost 3 years ago

4.66 kB

	---
	language: en
	tags:
	- exbert

	---


	# GPT-2

	This is a more up-to-date version of the original GPT2, which is a pretrained model on English language using a causal language modeling (CLM) objective.

	## Intended uses & limitations

	You can use the raw model for text generation or fine-tune it to a downstream task. See the

	## How to use

	You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we
	set a seed for reproducibility:

	```python
	>>> from transformers import pipeline, set_seed
	>>> generator = pipeline('text-generation', model='olm/olm-gpt2-oct-2022')
	>>> set_seed(42)
	>>> generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5)
	```

	Here is how to use this model to get the features of a given text in PyTorch:

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	tokenizer = AutoTokenizer.from_pretrained('olm/olm-gpt2-oct-2022')
	model = AutoModelForCausalLM.from_pretrained('gpt2')
	text = "Replace me by any text you'd like."
	encoded_input = tokenizer(text, return_tensors='pt')
	output = model(**encoded_input)
	```

	## Dataset

	The model and tokenizer were trained with this [October 2022 cleaned Common Crawl dataset](https://huggingface.co/datasets/olm/olm-CC-MAIN-2022-40-sampling-ratio-0.15894621295) plus this [October 2022 cleaned Wikipedia dataset](https://huggingface.co/datasets/olm/olm-wikipedia-20221001).
	The tokenized version of these concatenated datasets is [here](https://huggingface.co/datasets/olm/olm-october-2022-tokenized-1024).
	The datasets were created with this [repo](https://github.com/huggingface/olm-datasets).

	## Training

	The model was trained according to the GPT2 instructions at this [repo](https://github.com/huggingface/olm-training).

	## Evaluation results

	The model achieves the following results without any fine-tuning (zero-shot):

	\| Task \| Metric \| Original GPT2 \| OLM GPT2 (Ours) \| Significance (two-tailed p-value) \|
	\|:------------\|:-----------\|--------------------:\|----------------------:\|----------------------------------:\|
	\|rte \|acc \|0.5307 \|0.5415 \|0.7188 \|
	\|piqa \|acc/acc_norm\|0.6289/0.6251 \|0.6638/0.6670 \|0.0020/0.0002 \|
	\|copa \|acc \|0.6400 \|0.6900 \|0.3000 \|
	\|record \|f1/em \|0.7094/0.7026\|0.6874/0.6810 \|0.0000/0.0000 \|
	\|boolq \|acc \|0.4872 \|0.5606 \|0.0000 \|
	\|cb \|acc/f1 \|0.4101/0.2619 \|0.3571/0.1754 \|0.4193/NA \|
	\|hellaswag \|acc/acc_norm\|0.2892/0.3114 \|0.3076/0.3491 \|0.0000/0.0000 \|
	\|mrpc \|acc/f1 \|0.5662/0.6911 \|0.6495/0.7741 \|0.0007/0.0002 \|
	\|multirc \|acc \|0.0189 \|0.0115 \|0.0959 \|
	\|lambada \|ppl/acc \|40.0554/0.3256 \|28.6733/0.3625 \|0.0000/0.0000 \|
	\|wsc \|acc \|0.4327 \|0.3654 \|0.1679 \|
	\|wic \|acc \|0.4922 \|0.5 \|0.6924 \|
	\|mnli \|acc \|0.3372 \|0.3471 \|0.0384 \|
	\|qnli \|acc \|0.5017 \|0.4981 \|0.5884 \|
	\|cola \|mcc \|0.0126 \|0.0181 \|0.8614 \|
	\|triviaqa \|acc \|0.0151 \|0.0182 \|0.0048 \|
	\|winogrande \|acc \|0.5162 \|0.5114 \|0.7360 \|
	\|webqs \|acc \|0.0030 \|0.0108 \|0.0000 \|
	\|arc_easy \|acc/acc_norm\|0.4381/0.3948 \|0.4651/0.4247 \|0.0082/0.0029 \|
	\|arc_challenge\|acc/acc_norm\|0.1903/0.2270 \|0.1997/0.2329 \|0.4132/0.6256 \|

	To get these results, we used the Eleuther AI evaluation harness [here](https://github.com/EleutherAI/lm-evaluation-harness)
	The harness can produce results a little different than those reported in the GPT2 paper.
	The p-values come from the stderr from the evaluation harness, plus a normal distribution assumption.