Added note on normalizer.
Browse files
README.md
CHANGED
|
@@ -9,7 +9,9 @@ licenses:
|
|
| 9 |
|
| 10 |
This repository contains the pretrained discriminator checkpoint of the model **BanglaBERT**. This is an [ELECTRA](https://openreview.net/pdf?id=r1xMH1BtvB) discriminator model pretrained with the Replaced Token Detection (RTD) objective. Finetuned models using this checkpoint achieve state-of-the-art results on many of the NLP tasks in bengali.
|
| 11 |
|
| 12 |
-
For finetuning on different downstream tasks such as `Sentiment classification`, `Named Entity Recognition`, `Natural Language Inference` etc., refer to the scripts in the official [repository](https://github.com/csebuetnlp/banglabert).
|
|
|
|
|
|
|
| 13 |
|
| 14 |
## Using this model as a discriminator in `transformers` (tested on 4.11.0.dev0)
|
| 15 |
|
|
|
|
| 9 |
|
| 10 |
This repository contains the pretrained discriminator checkpoint of the model **BanglaBERT**. This is an [ELECTRA](https://openreview.net/pdf?id=r1xMH1BtvB) discriminator model pretrained with the Replaced Token Detection (RTD) objective. Finetuned models using this checkpoint achieve state-of-the-art results on many of the NLP tasks in bengali.
|
| 11 |
|
| 12 |
+
For finetuning on different downstream tasks such as `Sentiment classification`, `Named Entity Recognition`, `Natural Language Inference` etc., refer to the scripts in the official GitHub [repository](https://github.com/csebuetnlp/banglabert).
|
| 13 |
+
|
| 14 |
+
**Note**: This model was pretrained using a specific normalization pipeline available [here](https://github.com/csebuetnlp/normalizer). All finetuning scripts in the official GitHub repository uses this normalization by default. If you need to adapt the pretrained model for a different task make sure the text units are normalized using this pipeline before tokenizing to get best results. A basic example is given below:
|
| 15 |
|
| 16 |
## Using this model as a discriminator in `transformers` (tested on 4.11.0.dev0)
|
| 17 |
|