ai-forever commited on
Commit
70913cc
·
verified ·
1 Parent(s): 8036f73

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -216,7 +216,7 @@ Note this provides both in- and out-of-domain evaluation as some of the tasks an
216
 
217
  <!-- These are the evaluation metrics being used, ideally with a description of why. -->
218
 
219
- We employed **Spearman’s rank correlation** with expert judgements and **Mean Absolute Error (MAE)** metrics alongside the Verdict Confidence to assess the performance of pollux-judge-7b and compare it with those of the reference models.
220
 
221
  MAE offers a high degree of interpretability, as it is measured on the same scale as the annotation – specifically, in points.
222
  On the other hand, Spearman’s rank correlation allows to quantify the degree of monotonic association between the two rankings of models outputs and
 
216
 
217
  <!-- These are the evaluation metrics being used, ideally with a description of why. -->
218
 
219
+ We employed **Spearman’s rank correlation** with expert judgements and **Mean Absolute Error (MAE)** metrics alongside the Verdict Confidence to assess the performance of pollux-judge-32b and compare it with those of the reference models.
220
 
221
  MAE offers a high degree of interpretability, as it is measured on the same scale as the annotation – specifically, in points.
222
  On the other hand, Spearman’s rank correlation allows to quantify the degree of monotonic association between the two rankings of models outputs and