deliciouscat's picture
Update README.md
c5c9564 verified
|
raw
history blame
851 Bytes
---
datasets:
- HuggingFaceFW/fineweb
language:
- en
---
# Encoder-Decoder model with DeBERTa decoder
## pre-trained models
- Encoder: `microsoft/deberta-v3-small`
- Decoder: `deliciouscat/deberta-v3-base-decoder-v0.1` (6 transformer layers, 8 attention heads)
## Data used
`HuggingFaceFW/fineweb` -> sampled 124800
## Training hparams
- optimizer: AdamW, lr=2.3e-5, betas=(0.875, 0.997)
- batch size: 12 (maximal on Colab pro A100 env)
-> training on denoising objective (BART)
## How to use
```
from transformers import AutoTokenizer, EncoderDecoderModel
model = EncoderDecoderModel.from_pretrained("deliciouscat/deberta-v3-base-encoder-decoder-v0.2")
tokenizer = AutoTokenizer.from_pretrained("deliciouscat/deberta-v3-base-encoder-decoder-v0.2")
```
## Future work!
- train more scientific data
- fine-tune on keyword extraction task