Update README.md
Browse files
README.md
CHANGED
@@ -227,7 +227,7 @@ Note this provides both in- and out-of-domain evaluation as some of the tasks an
|
|
227 |
|
228 |
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
229 |
|
230 |
-
We employed **Spearman’s rank correlation** with expert judgements and **Mean Absolute Error (MAE)** metrics alongside the Verdict Confidence to assess the performance of pollux-judge-7b and compare it with those of the reference models.
|
231 |
|
232 |
MAE offers a high degree of interpretability, as it is measured on the same scale as the annotation – specifically, in points.
|
233 |
On the other hand, Spearman’s rank correlation allows to quantify the degree of monotonic association between the two rankings of models outputs and
|
|
|
227 |
|
228 |
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
229 |
|
230 |
+
We employed **Spearman’s rank correlation** with expert judgements and **Mean Absolute Error (MAE)** metrics alongside the Verdict Confidence to assess the performance of pollux-judge-7b-r and compare it with those of the reference models.
|
231 |
|
232 |
MAE offers a high degree of interpretability, as it is measured on the same scale as the annotation – specifically, in points.
|
233 |
On the other hand, Spearman’s rank correlation allows to quantify the degree of monotonic association between the two rankings of models outputs and
|