Taja Kuzman
commited on
Update README.md
Browse files
README.md
CHANGED
|
@@ -277,13 +277,17 @@ trained on all three datasets, outperforms classifiers that were trained on just
|
|
| 277 |
|
| 278 |
Additionally, we evaluated the X-GENRE classifier on a multilingual X-GINCO dataset that comprises samples
|
| 279 |
of texts from the MaCoCu web corpora (http://hdl.handle.net/11356/1969).
|
| 280 |
-
The X-GINCO dataset comprises 790 instances in 10 languages -
|
| 281 |
Albanian, Croatian, Catalan, Greek, Icelandic, Macedonian, Maltese, Slovenian, Turkish, and Ukrainian.
|
| 282 |
To evaluate the performance on genre labels, the dataset is balanced by labels,
|
| 283 |
and the vague label "Other" is not included.
|
| 284 |
Additionally, instances that were predicted with a confidence score below 0.80 were not included in the test dataset.
|
|
|
|
|
|
|
| 285 |
The evaluation shows high cross-lingual performance of the model,
|
| 286 |
even when applied to languages that are not related to the training languages (English and Slovenian) and when applied on non-Latin scripts.
|
|
|
|
|
|
|
| 287 |
The outlier is Maltese, on which classifier does not perform well -
|
| 288 |
we presume that this is due to the fact that Maltese is not included in the pretraining data of the XLM-RoBERTa model.
|
| 289 |
|
|
|
|
| 277 |
|
| 278 |
Additionally, we evaluated the X-GENRE classifier on a multilingual X-GINCO dataset that comprises samples
|
| 279 |
of texts from the MaCoCu web corpora (http://hdl.handle.net/11356/1969).
|
| 280 |
+
The X-GINCO dataset comprises 790 manually-annotated instances in 10 languages -
|
| 281 |
Albanian, Croatian, Catalan, Greek, Icelandic, Macedonian, Maltese, Slovenian, Turkish, and Ukrainian.
|
| 282 |
To evaluate the performance on genre labels, the dataset is balanced by labels,
|
| 283 |
and the vague label "Other" is not included.
|
| 284 |
Additionally, instances that were predicted with a confidence score below 0.80 were not included in the test dataset.
|
| 285 |
+
|
| 286 |
+
|
| 287 |
The evaluation shows high cross-lingual performance of the model,
|
| 288 |
even when applied to languages that are not related to the training languages (English and Slovenian) and when applied on non-Latin scripts.
|
| 289 |
+
|
| 290 |
+
|
| 291 |
The outlier is Maltese, on which classifier does not perform well -
|
| 292 |
we presume that this is due to the fact that Maltese is not included in the pretraining data of the XLM-RoBERTa model.
|
| 293 |
|