File size: 4,944 Bytes
			
			| a838dd3 ff59f31 a838dd3 b1c6b96 1f8c71e 06bfccb b1c6b96 06bfccb af5412b 06bfccb 968c0e7 b1c6b96 051457e a838dd3 2a738fe a838dd3 94cfd32 a838dd3 94cfd32 a838dd3 94cfd32 a838dd3 94cfd32 a838dd3 94cfd32 a838dd3 94cfd32 3b1edd6 8c6f40a 6c8cc5d 657932a 6c8cc5d 8c6f40a 6c8cc5d 3b1edd6 80c36fa 3b1edd6 6c8cc5d 3b1edd6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 | ---
license: apache-2.0
model-index:
- name: Graphcore/gptj-mnli
  results:
  - task:
      type: text-classification
    dataset:
      type: glue
      config: mnli
      name: glue mnli mismatched
      split: validation_mismatched
    metrics:
      - type: accuracy
        value: 82.5
datasets:
- glue
tags:
- pytorch
- causal-lm
- text-classification
- text-generation
widget:
- text: "mnli hypothesis: Your contributions were of no help with our students' education. premise: Your contribution helped make it possible for us to provide our students with a quality education. target:"
---
# Graphcore/gptj-mnli
This model is the fine-tuned version of [EleutherAI/gpt-j-6B](https://huggingface.co/EleutherAI/gpt-j-6B) on the [MNLI dataset](https://huggingface.co/datasets/multi_nli)
MNLI dataset consists of pairs of sentences, a *premise* and a *hypothesis*.
The task is to predict the relation between the premise and the hypothesis, which can be:
- `entailment`: hypothesis follows from the premise,
- `contradiction`: hypothesis contradicts the premise,
- `neutral`: hypothesis and premise are unrelated.
We finetune the model as a Causal Language Model (CLM): given a sequence of tokens, the task is to predict the next token.
To achieve this, we create a stylised prompt string, following the approach of [T5 paper](https://arxiv.org/pdf/1910.10683.pdf).
```shell
mnli hypothesis: {hypothesis} premise: {premise} target: {class_label} <|endoftext|>
```
For example:
```
mnli hypothesis: Your contributions were of no help with our students' education. premise: Your contribution helped make it possible for us to provide our students with a quality education. target: contradiction <|endoftext|>
```
## Fine-tuning and validation data
Fine tuning is done using the `train` split of the GLUE MNLI dataset and the performance is measured using the `validation_mismatched` split.
`validation_mismatched` means validation examples are not derived from the same sources as those in the training set and therefore not closely resembling any of the examples seen at training time.
## Fine-tuning procedure
Fine tuned on a Graphcore IPU-POD64 using `popxl`.     
Prompt sentences are tokenized and packed together to form 1024 token sequences, following [HF packing algorithm](https://github.com/huggingface/transformers/blob/v4.20.1/examples/pytorch/language-modeling/run_clm.py). No padding is used.
Since the model is trained to predict the next token, labels are simply the input sequence shifted by one token.
Given the training format, no extra care is needed to account for different sequences: the model does not need to know which sentence a token belongs to.
### Hyperparameters:
- epochs: 
- optimiser: AdamW (beta1: 0.9, beta2: 0.999, eps: 1e-6, weight decay: 0.0, learning rate: 5e-6)
- learning rate schedule: warmup schedule (min: 1e-7, max: 5e-6, warmup proportion: 0.005995)
- batch size: 128
## Performance
The resulting model matches SOTA performance with 82.5% accuracy.
```
Total number of examples                 9832
Number with badly formed result          0
Number with incorrect result             1725
Number with correct result               8107 
[82.5%]
example 0 = {'prompt_text': "mnli hypothesis: Your contributions were of no help with our students' education. premise: Your contribution helped make it possible for us to provide our students with a quality education. target:", 'class_label': 'contradiction'}
result = {'generated_text': ' contradiction'}
First 10 generated_text and expected class_label results:
 0: 'contradiction'                          contradiction
 1: 'contradiction'                          contradiction
 2: 'entailment'                             entailment
 3: 'contradiction'                          contradiction
 4: 'entailment'                             entailment
 5: 'entailment'                             entailment
 6: 'contradiction'                          contradiction
 7: 'contradiction'                          contradiction
 8: 'entailment'                             neutral
 9: 'contradiction'                          contradiction
```
## How to use
The model can be easily loaded using AutoModelForCausalLM.
You can use the pipeline API for text generation.
```python
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('EleutherAI/gpt-j-6B')
hf_model = AutoModelForCausalLM.from_pretrained("Graphcore/gptj-mnli", pad_token_id=tokenizer.eos_token_id)
generator =  pipeline('text-generation', model=hf_model, tokenizer=tokenizer)
prompt = "mnli hypothesis: Your contributions were of no help with our students' education." \
         "premise: Your contribution helped make it possible for us to provide our students with a quality education. target:"
out = generator(prompt, return_full_text=False, max_new_tokens=5, top_k=1)
# [{'generated_text': ' contradiction'}]
``` | 
