Update README.md
Browse files
README.md
CHANGED
@@ -99,6 +99,8 @@ This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing
|
|
99 |
|
100 |
* Evaluations were conducted on the [Lyte/ConnectFour-T10](hf.co/datasets/Lyte/ConnectFour-T10) dataset's validation split to test whether the model learns to win by presenting it with a board showing only the winning position left.
|
101 |
|
|
|
|
|
102 |
|
103 |
#### Summary Metrics Comparison
|
104 |
|
|
|
99 |
|
100 |
* Evaluations were conducted on the [Lyte/ConnectFour-T10](hf.co/datasets/Lyte/ConnectFour-T10) dataset's validation split to test whether the model learns to win by presenting it with a board showing only the winning position left.
|
101 |
|
102 |
+
* evals sampling parameters are as follows:
|
103 |
+
* temperature=0.6, top_p=0.95, max_tokens=1024
|
104 |
|
105 |
#### Summary Metrics Comparison
|
106 |
|