ai-forever
/

pollux-judge-32b

Text Generation

text-generation-inference

Model card Files Files and versions

ai-forever commited on Jun 25

Commit

70913cc

·

verified ·

1 Parent(s): 8036f73

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -216,7 +216,7 @@ Note this provides both in- and out-of-domain evaluation as some of the tasks an
 <!-- These are the evaluation metrics being used, ideally with a description of why. -->
-We employed **Spearman’s rank correlation** with expert judgements and **Mean Absolute Error (MAE)** metrics alongside the Verdict Confidence to assess the performance of pollux-judge-7b and compare it with those of the reference models.
 MAE offers a high degree of interpretability, as it is measured on the same scale as the annotation – specifically, in points.
 On the other hand, Spearman’s rank correlation allows to quantify the degree of monotonic association between the two rankings of models outputs and

 <!-- These are the evaluation metrics being used, ideally with a description of why. -->
+We employed **Spearman’s rank correlation** with expert judgements and **Mean Absolute Error (MAE)** metrics alongside the Verdict Confidence to assess the performance of pollux-judge-32b and compare it with those of the reference models.
 MAE offers a high degree of interpretability, as it is measured on the same scale as the annotation – specifically, in points.
 On the other hand, Spearman’s rank correlation allows to quantify the degree of monotonic association between the two rankings of models outputs and