Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -99,6 +99,8 @@ This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing
 * Evaluations were conducted on the [Lyte/ConnectFour-T10](hf.co/datasets/Lyte/ConnectFour-T10) dataset's validation split to test whether the model learns to win by presenting it with a board showing only the winning position left.
 #### Summary Metrics Comparison

 * Evaluations were conducted on the [Lyte/ConnectFour-T10](hf.co/datasets/Lyte/ConnectFour-T10) dataset's validation split to test whether the model learns to win by presenting it with a board showing only the winning position left.
+* evals sampling parameters are as follows:
+* temperature=0.6, top_p=0.95, max_tokens=1024
 #### Summary Metrics Comparison