Update README.md
Browse files
README.md
CHANGED
|
@@ -96,7 +96,7 @@ for output in outputs:
|
|
| 96 |
|
| 97 |
# Evaluation
|
| 98 |
|
| 99 |
-
We evaluated this model for output accuracy and the percentage of valid Japanese `<think>` sections using the first 50 rows of the
|
| 100 |
|
| 101 |
We compare this to the original R1 model and test in both regimes where repetition penalty is 1.0 and 1.1:
|
| 102 |
|
|
@@ -110,7 +110,7 @@ We compare this to the original R1 model and test in both regimes where repetiti
|
|
| 110 |
Code for the SakanaAI/gsm8k-ja-test_250-1319 evaluation can be found [here](https://drive.google.com/file/d/1gCzCJv5vasw8R3KVQimfoIDFyfxwxNvC/view?usp=sharing).
|
| 111 |
|
| 112 |
|
| 113 |
-
We further use the first 50 prompts from
|
| 114 |
This benchmark contains more varied and complex prompts, meaning this is a more realistic evaluation of how reliably this model can output Japanese.
|
| 115 |
|
| 116 |
| | Repetition Penalty | Valid Japanese `<think>` (%) |
|
|
|
|
| 96 |
|
| 97 |
# Evaluation
|
| 98 |
|
| 99 |
+
We evaluated this model for output accuracy and the percentage of valid Japanese `<think>` sections using the first 50 rows of the [SakanaAI/gsm8k-ja-test_250-1319](https://huggingface.co/datasets/SakanaAI/gsm8k-ja-test_250-1319) dataset.
|
| 100 |
|
| 101 |
We compare this to the original R1 model and test in both regimes where repetition penalty is 1.0 and 1.1:
|
| 102 |
|
|
|
|
| 110 |
Code for the SakanaAI/gsm8k-ja-test_250-1319 evaluation can be found [here](https://drive.google.com/file/d/1gCzCJv5vasw8R3KVQimfoIDFyfxwxNvC/view?usp=sharing).
|
| 111 |
|
| 112 |
|
| 113 |
+
We further use the first 50 prompts from [DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja](https://huggingface.co/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja) to evaluate the percentage of valid Japanese `\<think\>` sections in model responses.
|
| 114 |
This benchmark contains more varied and complex prompts, meaning this is a more realistic evaluation of how reliably this model can output Japanese.
|
| 115 |
|
| 116 |
| | Repetition Penalty | Valid Japanese `<think>` (%) |
|