Taja Kuzman
commited on
Update README.md
Browse files
README.md
CHANGED
|
@@ -147,24 +147,46 @@ If you use the model, please cite the paper:
|
|
| 147 |
## AGILE - Automatic Genre Identification Benchmark
|
| 148 |
|
| 149 |
We set up a benchmark for evaluating robustness of automatic genre identification models to test their usability
|
| 150 |
-
for the automatic enrichment of large text collections with genre information.
|
| 151 |
You are welcome to submit your entry at the [benchmark's GitHub repository](https://github.com/TajaKuzman/AGILE-Automatic-Genre-Identification-Benchmark/tree/main).
|
| 152 |
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
|
| 160 |
-
| GPT-
|
| 161 |
-
|
|
| 162 |
-
|
|
| 163 |
-
|
|
| 164 |
-
|
|
| 165 |
-
|
|
| 166 |
-
|
|
| 167 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 168 |
|
| 169 |
|
| 170 |
## Intended use and limitations
|
|
@@ -233,76 +255,6 @@ labels_map={'Other': 0, 'Information/Explanation': 1, 'News': 2, 'Instruction':
|
|
| 233 |
| Other | A text that which does not fall under any of other genre categories. | |
|
| 234 |
|
| 235 |
|
| 236 |
-
## Performance
|
| 237 |
-
|
| 238 |
-
### Comparison with other models at in-dataset and cross-dataset experiments
|
| 239 |
-
|
| 240 |
-
The X-GENRE model was compared with `xlm-roberta-base` classifiers, fine-tuned on each of genre datasets separately,
|
| 241 |
-
using the X-GENRE schema (see experiments in https://github.com/TajaKuzman/Genre-Datasets-Comparison).
|
| 242 |
-
|
| 243 |
-
At the in-dataset experiments (trained and tested on splits of the same dataset),
|
| 244 |
-
it outperforms all datasets, except the FTD dataset which has a smaller number of X-GENRE labels.
|
| 245 |
-
|
| 246 |
-
| Trained on | Micro F1 | Macro F1 |
|
| 247 |
-
|:-------------|-----------:|-----------:|
|
| 248 |
-
| FTD | 0.843 | 0.851 |
|
| 249 |
-
| X-GENRE | 0.797 | 0.794 |
|
| 250 |
-
| CORE | 0.778 | 0.627 |
|
| 251 |
-
| GINCO | 0.754 | 0.75 |
|
| 252 |
-
|
| 253 |
-
When applied on test splits of each of the datasets, the classifier performs well:
|
| 254 |
-
|
| 255 |
-
| Trained on | Tested on | Micro F1 | Macro F1 |
|
| 256 |
-
|:-------------|:------------|-----------:|-----------:|
|
| 257 |
-
| X-GENRE | CORE | 0.837 | 0.859 |
|
| 258 |
-
| X-GENRE | FTD | 0.804 | 0.809 |
|
| 259 |
-
| X-GENRE | X-GENRE | 0.797 | 0.794 |
|
| 260 |
-
| X-GENRE | X-GENRE-dev | 0.784 | 0.784 |
|
| 261 |
-
| X-GENRE | GINCO | 0.749 | 0.758 |
|
| 262 |
-
|
| 263 |
-
The classifier was compared with other classifiers on 2 additional genre datasets (to which the X-GENRE schema was mapped):
|
| 264 |
-
- EN-GINCO (available upon request): a sample of the English enTenTen20 corpus
|
| 265 |
-
- [FinCORE](https://github.com/TurkuNLP/FinCORE): Finnish CORE corpus
|
| 266 |
-
|
| 267 |
-
| Trained on | Tested on | Micro F1 | Macro F1 |
|
| 268 |
-
|:-------------|:------------|-----------:|-----------:|
|
| 269 |
-
| X-GENRE | EN-GINCO | 0.688 | 0.691 |
|
| 270 |
-
| X-GENRE | FinCORE | 0.674 | 0.581 |
|
| 271 |
-
| GINCO | EN-GINCO | 0.632 | 0.502 |
|
| 272 |
-
| FTD | EN-GINCO | 0.574 | 0.475 |
|
| 273 |
-
| CORE | EN-GINCO | 0.485 | 0.422 |
|
| 274 |
-
|
| 275 |
-
At cross-dataset and cross-lingual experiments, it was shown that the X-GENRE classifier,
|
| 276 |
-
trained on all three datasets, outperforms classifiers that were trained on just one of the datasets.
|
| 277 |
-
|
| 278 |
-
Additionally, we evaluated the X-GENRE classifier on a multilingual X-GINCO dataset that comprises samples
|
| 279 |
-
of texts from the MaCoCu web corpora (http://hdl.handle.net/11356/1969).
|
| 280 |
-
The X-GINCO dataset comprises 790 manually-annotated instances in 10 languages -
|
| 281 |
-
Albanian, Croatian, Catalan, Greek, Icelandic, Macedonian, Maltese, Slovenian, Turkish, and Ukrainian.
|
| 282 |
-
To evaluate the performance on genre labels, the dataset is balanced by labels,
|
| 283 |
-
and the vague label "Other" is not included.
|
| 284 |
-
Additionally, instances that were predicted with a confidence score below 0.80 were not included in the test dataset.
|
| 285 |
-
|
| 286 |
-
|
| 287 |
-
The evaluation shows high cross-lingual performance of the model,
|
| 288 |
-
even when applied to languages that are not related to the training languages (English and Slovenian) and when applied on non-Latin scripts.
|
| 289 |
-
|
| 290 |
-
|
| 291 |
-
The outlier is Maltese, on which classifier does not perform well -
|
| 292 |
-
we presume that this is due to the fact that Maltese is not included in the pretraining data of the XLM-RoBERTa model.
|
| 293 |
-
|
| 294 |
-
| Genre label | ca | el | hr | is | mk | sl | sq | tr | uk | Avg | mt |
|
| 295 |
-
|---------------|------|------|------|------|------|------|------|------|------|------|------|
|
| 296 |
-
| News | 0.82 | 0.90 | 0.95 | 0.73 | 0.91 | 0.90 | 0.89 | 0.95 | 1.00 | 0.89 | 0.69 |
|
| 297 |
-
| Opinion/Argumentation | 0.84 | 0.87 | 0.78 | 0.82 | 0.78 | 0.82 | 0.67 | 0.82 | 0.91 | 0.81 | 0.33 |
|
| 298 |
-
| Instruction | 0.75 | 0.71 | 0.75 | 0.78 | 1.00 | 1.00 | 0.95 | 0.90 | 0.95 | 0.86 | 0.69 |
|
| 299 |
-
| Information/Explanation | 0.72 | 0.70 | 0.95 | 0.50 | 0.84 | 0.90 | 0.80 | 0.82 | 1.00 | 0.80 | 0.52 |
|
| 300 |
-
| Promotion | 0.78 | 0.62 | 0.87 | 0.75 | 0.95 | 1.00 | 0.95 | 0.86 | 0.78 | 0.84 | 0.82 |
|
| 301 |
-
| Forum | 0.84 | 0.95 | 0.91 | 0.95 | 1.00 | 1.00 | 0.78 | 0.89 | 0.95 | 0.91 | 0.18 |
|
| 302 |
-
| Prose/Lyrical | 0.91 | 1.00 | 0.86 | 1.00 | 0.95 | 0.91 | 0.86 | 0.95 | 1.00 | 0.93 | 0.18 |
|
| 303 |
-
| Legal | 0.95 | 1.00 | 1.00 | 0.84 | 0.95 | 0.95 | 0.95 | 1.00 | 1.00 | 0.96 | / |
|
| 304 |
-
| Macro-F1 | 0.83 | 0.84 | 0.88 | 0.80 | 0.92 | 0.94 | 0.85 | 0.90 | 0.95 | 0.87 | 0.49 |
|
| 305 |
-
|
| 306 |
### Fine-tuning hyperparameters
|
| 307 |
|
| 308 |
Fine-tuning was performed with `simpletransformers`.
|
|
|
|
| 147 |
## AGILE - Automatic Genre Identification Benchmark
|
| 148 |
|
| 149 |
We set up a benchmark for evaluating robustness of automatic genre identification models to test their usability
|
| 150 |
+
for the automatic enrichment of large text collections with genre information. The benchmark comprises 11 European languages and two test datasets.
|
| 151 |
You are welcome to submit your entry at the [benchmark's GitHub repository](https://github.com/TajaKuzman/AGILE-Automatic-Genre-Identification-Benchmark/tree/main).
|
| 152 |
|
| 153 |
+
The model outperforms all other technologies, including GPT models (used in a zero-shot scenario).
|
| 154 |
+
|
| 155 |
+
Results on the English test dataset (EN-GINCO):
|
| 156 |
+
|
| 157 |
+
| Model | Test Dataset | Macro F1 | Micro F1 |
|
| 158 |
+
|:-------------------------------------------------------------------------------------------------------------------|:---------------|-----------:|-----------:|
|
| 159 |
+
| [X-GENRE classifier](https://huggingface.co/classla/xlm-roberta-base-multilingual-text-genre-classifier) | en-ginco | 0.687 | 0.684 |
|
| 160 |
+
| GPT-4o (gpt-4o-2024-08-06) (zero-shot) | en-ginco | 0.62 | 0.735 |
|
| 161 |
+
| Llama 3.3 (70B) (zero-shot) | en-ginco | 0.586 | 0.684 |
|
| 162 |
+
| Gemma 2 (27B) (zero-shot) | en-ginco | 0.564 | 0.603 |
|
| 163 |
+
| Gemma 3 (27B) (zero-shot) | en-ginco | 0.541 | 0.672 |
|
| 164 |
+
| GPT-4o-mini (gpt-4o-mini-2024-07-18) (zero-shot) | en-ginco | 0.534 | 0.632 |
|
| 165 |
+
| Support Vector Machine | en-ginco | 0.514 | 0.489 |
|
| 166 |
+
| GPT-3.5-Turbo (zero-shot) | en-ginco | 0.494 | 0.625 |
|
| 167 |
+
| DeepSeek-R1 14B (zero-shot) | en-ginco | 0.293 | 0.229 |
|
| 168 |
+
| Dummy Classifier (stratified) | en-ginco | 0.088 | 0.154 |
|
| 169 |
+
| Dummy classifier (most frequent) | en-ginco | 0.032 | 0.169 |
|
| 170 |
+
|
| 171 |
+
Results on the multilingual test dataset (X-GINCO), comprising instances in Albanian, Catalan, Croatian, Greek, Icelandic, Macedonian, Maltese, Slovenian, Turkish, and Ukrainian:
|
| 172 |
+
|
| 173 |
+
| Model | Test Dataset | Macro F1 | Micro F1 |
|
| 174 |
+
|:-------------------------------------------------------------------------------------------------------------------|:---------------|-----------:|-----------:|
|
| 175 |
+
| [X-GENRE classifier](https://huggingface.co/classla/xlm-roberta-base-multilingual-text-genre-classifier) | x-ginco | 0.847 | 0.845 |
|
| 176 |
+
| GPT-4o (gpt-4o-2024-08-06) (zero-shot) | x-ginco | 0.776 | 0.769 |
|
| 177 |
+
| Llama 3.3 (70B) (zero-shot) | x-ginco | 0.741 | 0.738 |
|
| 178 |
+
| Gemma 3 (27B) (zero-shot) | x-ginco | 0.739 | 0.733 |
|
| 179 |
+
| GPT-4o-mini (gpt-4o-mini-2024-07-18) (zero-shot) | x-ginco | 0.688 | 0.67 |
|
| 180 |
+
| GPT-3.5-Turbo (zero-shot) | x-ginco | 0.627 | 0.622 |
|
| 181 |
+
| Gemma 2 (27B) (zero-shot) | x-ginco | 0.612 | 0.593 |
|
| 182 |
+
| DeepSeek-R1 14B (zero-shot) | x-ginco | 0.197 | 0.204 |
|
| 183 |
+
| Support Vector Machine | x-ginco | 0.166 | 0.184 |
|
| 184 |
+
| Dummy Classifier (stratified) | x-ginco | 0.106 | 0.113 |
|
| 185 |
+
| Dummy classifier (most frequent) | x-ginco | 0.029 | 0.133 |
|
| 186 |
+
|
| 187 |
+
(The multilingual test dataset is easier than the English one, as the vague label "Other" and instances that were predicted with a confidence score below 0.80 were not included in the test dataset.)
|
| 188 |
+
|
| 189 |
+
For language-specific results, see [the AGILE benchmark](https://github.com/TajaKuzman/AGILE-Automatic-Genre-Identification-Benchmark).
|
| 190 |
|
| 191 |
|
| 192 |
## Intended use and limitations
|
|
|
|
| 255 |
| Other | A text that which does not fall under any of other genre categories. | |
|
| 256 |
|
| 257 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 258 |
### Fine-tuning hyperparameters
|
| 259 |
|
| 260 |
Fine-tuning was performed with `simpletransformers`.
|