lightblue
/

DeepSeek-R1-Distill-Qwen-7B-Japanese

@@ -107,7 +107,7 @@ We compare this to the original R1 model and test in both regimes where repetiti
 | lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0                | 66                  | 92                         |
 | lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1                | 70                  | 98                         |
-Code for the SakanaAI/gsm8k-ja-test_250-1319 evaluation can be found [here](https://drive.google.com/file/d/1gCzCJv5vasw8R3KVQimfoIDFyfxwxNvC/view?usp=sharing):
 We further use the first 50 prompts from (DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja)[https://huggingface.co/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja] to evaluate the percentage of valid Japanese `\<think\>` sections in model responses.
@@ -120,7 +120,7 @@ This benchmark contains more varied and complex prompts, meaning this is a more
 | lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0                | 84                         |
 | lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1                | 94                         |
-Code for the DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja evaluation can be found [here](https://drive.google.com/file/d/1f75IM5x1SZrb300odkEsLMfKsfibrxvR/view?usp=sharing):
 # How this model was made
@@ -228,7 +228,7 @@ for output in outputs:
 | lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0                | 66                  | 92                         |
 | lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1                | 70                  | 98                         |
-SakanaAI/gsm8k-ja-test_250-1319の評価コードは[こちら](https://drive.google.com/file/d/1gCzCJv5vasw8R3KVQimfoIDFyfxwxNvC/view?usp=sharing)にあります：
 さらに、(DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja)[https://huggingface.co/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja]の最初の50プロンプトを使用して、モデル応答における有効な日本語の`<think>`セクションの割合を評価します。このベンチマークにはより多様で複雑なプロンプトが含まれており、モデルが日本語を信頼性高く出力できるかどうかを、より現実的に評価します。
@@ -239,7 +239,7 @@ SakanaAI/gsm8k-ja-test_250-1319の評価コードは[こちら](https://drive.go
 | lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0                | 84                         |
 | lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1                | 94                         |
-DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja評価コードは[こちら](https://drive.google.com/file/d/1f75IM5x1SZrb300odkEsLMfKsfibrxvR/view?usp=sharing)にあります：
 # 作成方法

 | lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0                | 66                  | 92                         |
 | lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1                | 70                  | 98                         |
+Code for the SakanaAI/gsm8k-ja-test_250-1319 evaluation can be found [here](https://drive.google.com/file/d/1gCzCJv5vasw8R3KVQimfoIDFyfxwxNvC/view?usp=sharing).
 We further use the first 50 prompts from (DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja)[https://huggingface.co/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja] to evaluate the percentage of valid Japanese `\<think\>` sections in model responses.
 | lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0                | 84                         |
 | lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1                | 94                         |
+Code for the DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja evaluation can be found [here](https://drive.google.com/file/d/1f75IM5x1SZrb300odkEsLMfKsfibrxvR/view?usp=sharing).
 # How this model was made
 | lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0                | 66                  | 92                         |
 | lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1                | 70                  | 98                         |
+SakanaAI/gsm8k-ja-test_250-1319の評価コードは[こちら](https://drive.google.com/file/d/1gCzCJv5vasw8R3KVQimfoIDFyfxwxNvC/view?usp=sharing)にあります。
 さらに、(DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja)[https://huggingface.co/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja]の最初の50プロンプトを使用して、モデル応答における有効な日本語の`<think>`セクションの割合を評価します。このベンチマークにはより多様で複雑なプロンプトが含まれており、モデルが日本語を信頼性高く出力できるかどうかを、より現実的に評価します。
 | lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.0                | 84                         |
 | lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese | 1.1                | 94                         |
+DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja評価コードは[こちら](https://drive.google.com/file/d/1f75IM5x1SZrb300odkEsLMfKsfibrxvR/view?usp=sharing)にあります。
 # 作成方法