ai-forever commited on
Commit
cebe6b7
·
verified ·
1 Parent(s): 843c117

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -227,7 +227,7 @@ Note this provides both in- and out-of-domain evaluation as some of the tasks an
227
 
228
  <!-- These are the evaluation metrics being used, ideally with a description of why. -->
229
 
230
- We employed **Spearman’s rank correlation** with expert judgements and **Mean Absolute Error (MAE)** metrics alongside the Verdict Confidence to assess the performance of pollux-judge-7b and compare it with those of the reference models.
231
 
232
  MAE offers a high degree of interpretability, as it is measured on the same scale as the annotation – specifically, in points.
233
  On the other hand, Spearman’s rank correlation allows to quantify the degree of monotonic association between the two rankings of models outputs and
 
227
 
228
  <!-- These are the evaluation metrics being used, ideally with a description of why. -->
229
 
230
+ We employed **Spearman’s rank correlation** with expert judgements and **Mean Absolute Error (MAE)** metrics alongside the Verdict Confidence to assess the performance of pollux-judge-7b-r and compare it with those of the reference models.
231
 
232
  MAE offers a high degree of interpretability, as it is measured on the same scale as the annotation – specifically, in points.
233
  On the other hand, Spearman’s rank correlation allows to quantify the degree of monotonic association between the two rankings of models outputs and