Update README.md
Browse files
README.md
CHANGED
@@ -206,7 +206,7 @@ Data Labeling for Evaluation Datasets:
|
|
206 |
- Hybrid: Human/Synthetic/Automatic
|
207 |
|
208 |
## Evaluation Results
|
209 |
-
|
210 |
|
211 |
> NOTE: Where applicable, a Prompt Template will be provided. While completing benchmarks, please ensure that you are parsing for the correct output format as per the provided prompt in order to reproduce the benchmarks seen below.
|
212 |
|
|
|
206 |
- Hybrid: Human/Synthetic/Automatic
|
207 |
|
208 |
## Evaluation Results
|
209 |
+
These results contain both “Reasoning On”, and “Reasoning Off”. We recommend using temperature=`0.6`, top_p=`0.95` for “Reasoning On” mode, and greedy decoding for “Reasoning Off” mode. All evaluations are done with 32k sequence length. We run the benchmarks up to 16 times and average the scores to be more accurate.
|
210 |
|
211 |
> NOTE: Where applicable, a Prompt Template will be provided. While completing benchmarks, please ensure that you are parsing for the correct output format as per the provided prompt in order to reproduce the benchmarks seen below.
|
212 |
|