Update README.md
Browse files
README.md
CHANGED
@@ -216,7 +216,7 @@ Note this provides both in- and out-of-domain evaluation as some of the tasks an
|
|
216 |
|
217 |
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
218 |
|
219 |
-
We employed **Spearman’s rank correlation** with expert judgements and **Mean Absolute Error (MAE)** metrics alongside the Verdict Confidence to assess the performance of pollux-judge-
|
220 |
|
221 |
MAE offers a high degree of interpretability, as it is measured on the same scale as the annotation – specifically, in points.
|
222 |
On the other hand, Spearman’s rank correlation allows to quantify the degree of monotonic association between the two rankings of models outputs and
|
|
|
216 |
|
217 |
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
218 |
|
219 |
+
We employed **Spearman’s rank correlation** with expert judgements and **Mean Absolute Error (MAE)** metrics alongside the Verdict Confidence to assess the performance of pollux-judge-32b and compare it with those of the reference models.
|
220 |
|
221 |
MAE offers a high degree of interpretability, as it is measured on the same scale as the annotation – specifically, in points.
|
222 |
On the other hand, Spearman’s rank correlation allows to quantify the degree of monotonic association between the two rankings of models outputs and
|