Taja Kuzman
commited on
Update README.md
Browse files
README.md
CHANGED
|
@@ -136,7 +136,8 @@ The model can be used for classification into topic labels from the
|
|
| 136 |
Based on a manually-annotated test set (in Croatian, Slovenian, Catalan and Greek),
|
| 137 |
the model achieves micro-F1 score of 0.733, macro-F1 score of 0.745 and accuracy of 0.733,
|
| 138 |
and outperforms the GPT-4o model (version `gpt-4o-2024-05-13`) used in a zero-shot setting.
|
| 139 |
-
If we use only labels that are predicted with a confidence score equal or higher than 0.90,
|
|
|
|
| 140 |
|
| 141 |
## Intended use and limitations
|
| 142 |
|
|
@@ -216,7 +217,8 @@ and enriched with information which specific subtopics belong to the top-level t
|
|
| 216 |
|
| 217 |
The model was fine-tuned on a training dataset consisting of 15,000 news in four languages (Croatian, Slovenian, Catalan and Greek).
|
| 218 |
The news texts were extracted from the [MaCoCu web corpora](https://macocu.eu/) based on the "News" genre label, predicted with the [X-GENRE classifier](https://huggingface.co/classla/xlm-roberta-base-multilingual-text-genre-classifier).
|
| 219 |
-
The training dataset was automatically annotated with the IPTC Media Topic labels by
|
|
|
|
| 220 |
|
| 221 |
Label distribution in the training dataset:
|
| 222 |
|
|
|
|
| 136 |
Based on a manually-annotated test set (in Croatian, Slovenian, Catalan and Greek),
|
| 137 |
the model achieves micro-F1 score of 0.733, macro-F1 score of 0.745 and accuracy of 0.733,
|
| 138 |
and outperforms the GPT-4o model (version `gpt-4o-2024-05-13`) used in a zero-shot setting.
|
| 139 |
+
If we use only labels that are predicted with a confidence score equal or higher than 0.90,
|
| 140 |
+
the model achieves micro-F1 and macro-F1 of 0.80.
|
| 141 |
|
| 142 |
## Intended use and limitations
|
| 143 |
|
|
|
|
| 217 |
|
| 218 |
The model was fine-tuned on a training dataset consisting of 15,000 news in four languages (Croatian, Slovenian, Catalan and Greek).
|
| 219 |
The news texts were extracted from the [MaCoCu web corpora](https://macocu.eu/) based on the "News" genre label, predicted with the [X-GENRE classifier](https://huggingface.co/classla/xlm-roberta-base-multilingual-text-genre-classifier).
|
| 220 |
+
The training dataset was automatically annotated with the IPTC Media Topic labels by
|
| 221 |
+
the [GPT-4o](https://platform.openai.com/docs/models/gpt-4o) model (yielding 0.72 micro-F1 and 0.73 macro-F1 on the test dataset).
|
| 222 |
|
| 223 |
Label distribution in the training dataset:
|
| 224 |
|