File size: 721 Bytes
e2ca893
9a3ac9d
e2ca893
9a3ac9d
e2ca893
9a3ac9d
e2ca893
9a3ac9d
e2ca893
9a3ac9d
e2ca893
9a3ac9d
e2ca893
9a3ac9d
e2ca893
 
9a3ac9d
e2ca893
9a3ac9d
e2ca893
 
9a3ac9d
e2ca893
 
 
9a3ac9d
e2ca893
9a3ac9d
e2ca893
9a3ac9d
e2ca893
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Encoder-Decoder model with DeBERTa decoder

## pre-trained models

Encoder: `microsoft/deberta-v3-small`

Decoder: `deliciouscat/deberta-v3-base-decoder-v0.1`; 6 transformer layers, 8 attention heads

## Data used

`HuggingFaceFW/fineweb` -> sampled 124800

## Training hparams

optimizer: AdamW, lr=2.3e-5, betas=(0.875, 0.997)
batch size: 12 (maximal on Colab pro A100 env)

## How to use

```
from transformers import AutoTokenizer, EncoderDecoderModel

model = EncoderDecoderModel.from_pretrained("patrickvonplaten/bert2bert_cnn_daily_mail")
tokenizer = AutoTokenizer.from_pretrained("patrickvonplaten/bert2bert_cnn_daily_mail")
```

## Future work!

train more scientific data

fine-tune on keyword extraction task