|
--- |
|
license: cc-by-sa-4.0 |
|
language: |
|
- hr |
|
- bs |
|
- sr |
|
--- |
|
# XLM-R-SloBertić |
|
|
|
This model was produced by pre-training [XLM-Roberta-large](https://huggingface.co/xlm-roberta-large) 48k steps on South Slavic languages. |
|
|
|
# Benchmarking |
|
Three tasks were chosen for model evaluation: |
|
* Named Entity Recognition (NER) |
|
* Sentiment regression |
|
* COPA (Choice of plausible alternatives) |
|
|
|
|
|
In all cases, this model was finetuned for specific downstream tasks. |
|
## NER |
|
(entry to be added soon) |
|
## Sentiment regression |
|
|
|
[ParlaSent dataset](https://huggingface.co/datasets/classla/ParlaSent) was used to evaluate sentiment regression for Bosnian, Croatian, and Serbian languages. |
|
The procedure is explained in greater detail in the dedicated [benchmarking repository](https://github.com/clarinsi/benchich/tree/main/sentiment). |
|
|
|
| system | train | test | r^2 | |
|
|:-----------------------------------------------------------------------|:--------------------|:-------------------------|------:| |
|
| [xlm-r-parlasent](https://huggingface.co/classla/xlm-r-parlasent) | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.615 | |
|
| [BERTić](https://huggingface.co/classla/bcms-bertic) | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.612 | |
|
| XLM-R-SloBERTić | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.607 | |
|
| XLM-Roberta-Large | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.605 | |
|
| ** XLM-R-BERTić ** | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.601 | |
|
| [crosloengual-bert](https://huggingface.co/EMBEDDIA/crosloengual-bert) | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.537 | |
|
| XLM-Roberta-Base | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.500 | |
|
| dummy (mean) | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | -0.12 | |
|
## COPA |
|
(to be added soon) |
|
|
|
# Citation |
|
(to be added soon) |
|
# Authors |
|
* [Nikola Ljubešič](https://huggingface.co/nljubesi) |