Update README.md
Browse files
README.md
CHANGED
|
@@ -107,7 +107,7 @@ We compare this to the original R1 model and test in both regimes where repetiti
|
|
| 107 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0 | 66 | 92 |
|
| 108 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1 | 70 | 98 |
|
| 109 |
|
| 110 |
-
Code for the SakanaAI/gsm8k-ja-test_250-1319 evaluation can be found [here](https://drive.google.com/file/d/1gCzCJv5vasw8R3KVQimfoIDFyfxwxNvC/view?usp=sharing)
|
| 111 |
|
| 112 |
|
| 113 |
We further use the first 50 prompts from (DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja)[https://huggingface.co/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja] to evaluate the percentage of valid Japanese `\<think\>` sections in model responses.
|
|
@@ -120,7 +120,7 @@ This benchmark contains more varied and complex prompts, meaning this is a more
|
|
| 120 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0 | 84 |
|
| 121 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1 | 94 |
|
| 122 |
|
| 123 |
-
Code for the DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja evaluation can be found [here](https://drive.google.com/file/d/1f75IM5x1SZrb300odkEsLMfKsfibrxvR/view?usp=sharing)
|
| 124 |
|
| 125 |
# How this model was made
|
| 126 |
|
|
@@ -228,7 +228,7 @@ for output in outputs:
|
|
| 228 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0 | 66 | 92 |
|
| 229 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1 | 70 | 98 |
|
| 230 |
|
| 231 |
-
SakanaAI/gsm8k-ja-test_250-1319の評価コードは[こちら](https://drive.google.com/file/d/1gCzCJv5vasw8R3KVQimfoIDFyfxwxNvC/view?usp=sharing)
|
| 232 |
|
| 233 |
さらに、(DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja)[https://huggingface.co/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja]の最初の50プロンプトを使用して、モデル応答における有効な日本語の`<think>`セクションの割合を評価します。このベンチマークにはより多様で複雑なプロンプトが含まれており、モデルが日本語を信頼性高く出力できるかどうかを、より現実的に評価します。
|
| 234 |
|
|
@@ -239,7 +239,7 @@ SakanaAI/gsm8k-ja-test_250-1319の評価コードは[こちら](https://drive.go
|
|
| 239 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0 | 84 |
|
| 240 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1 | 94 |
|
| 241 |
|
| 242 |
-
DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja評価コードは[こちら](https://drive.google.com/file/d/1f75IM5x1SZrb300odkEsLMfKsfibrxvR/view?usp=sharing)
|
| 243 |
|
| 244 |
# 作成方法
|
| 245 |
|
|
|
|
| 107 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0 | 66 | 92 |
|
| 108 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1 | 70 | 98 |
|
| 109 |
|
| 110 |
+
Code for the SakanaAI/gsm8k-ja-test_250-1319 evaluation can be found [here](https://drive.google.com/file/d/1gCzCJv5vasw8R3KVQimfoIDFyfxwxNvC/view?usp=sharing).
|
| 111 |
|
| 112 |
|
| 113 |
We further use the first 50 prompts from (DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja)[https://huggingface.co/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja] to evaluate the percentage of valid Japanese `\<think\>` sections in model responses.
|
|
|
|
| 120 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0 | 84 |
|
| 121 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1 | 94 |
|
| 122 |
|
| 123 |
+
Code for the DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja evaluation can be found [here](https://drive.google.com/file/d/1f75IM5x1SZrb300odkEsLMfKsfibrxvR/view?usp=sharing).
|
| 124 |
|
| 125 |
# How this model was made
|
| 126 |
|
|
|
|
| 228 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0 | 66 | 92 |
|
| 229 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1 | 70 | 98 |
|
| 230 |
|
| 231 |
+
SakanaAI/gsm8k-ja-test_250-1319の評価コードは[こちら](https://drive.google.com/file/d/1gCzCJv5vasw8R3KVQimfoIDFyfxwxNvC/view?usp=sharing)にあります。
|
| 232 |
|
| 233 |
さらに、(DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja)[https://huggingface.co/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja]の最初の50プロンプトを使用して、モデル応答における有効な日本語の`<think>`セクションの割合を評価します。このベンチマークにはより多様で複雑なプロンプトが含まれており、モデルが日本語を信頼性高く出力できるかどうかを、より現実的に評価します。
|
| 234 |
|
|
|
|
| 239 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0 | 84 |
|
| 240 |
| lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1 | 94 |
|
| 241 |
|
| 242 |
+
DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja評価コードは[こちら](https://drive.google.com/file/d/1f75IM5x1SZrb300odkEsLMfKsfibrxvR/view?usp=sharing)にあります。
|
| 243 |
|
| 244 |
# 作成方法
|
| 245 |
|