tngtech
/

DeepSeek-TNG-R1T2-Chimera

Text Generation

text-generation-inference

Model card Files Files and versions

TNGHK commited on 11 days ago

Commit

c0f05b5

·

verified ·

1 Parent(s): 2827249

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -67,6 +67,7 @@ We report measured benchmark results for our R1T2, R1T models and published benc
 | AIME-25                            | 70.0 | 58.3 |    49.6 | 70.0 |    87.5 |         | V3-0324 AIME-25 measured by us |
 | GPQA-Diamond                       | 77.9 | 72.0 |    68.4 | 71.5 |    81.0 |         |         |
 | Aider Polyglot                     | 64.4 | 48.4 |    44.9 | 52.0 |    71.6 | R1T2 beats two of its parents, V3-0324 and R1, and was measured to be about 2.2 times more token efficient, i.e. faster, than its third parent, R1-0528 | R1T2 source: Aider discord, t=0.75 |
 | EQ-Bench Longform Creative Writing | 76.4 |  ./. |    78.1 | 74.6 |    78.9 | EQ Bench version before August 8th, 2025 | see [EQ Bench](https://eqbench.com/creative_writing_longform.html)  |
 | Vectara Hallucination Rate         |  5.5 |  ./. |     8.0 | 14.3 |     7.7 | lower hallucination rates are better, R1T2 is better than all its three parents | see [Hallucination Leaderboard](https://github.com/vectara/hallucination-leaderboard) |

 | AIME-25                            | 70.0 | 58.3 |    49.6 | 70.0 |    87.5 |         | V3-0324 AIME-25 measured by us |
 | GPQA-Diamond                       | 77.9 | 72.0 |    68.4 | 71.5 |    81.0 |         |         |
 | Aider Polyglot                     | 64.4 | 48.4 |    44.9 | 52.0 |    71.6 | R1T2 beats two of its parents, V3-0324 and R1, and was measured to be about 2.2 times more token efficient, i.e. faster, than its third parent, R1-0528 | R1T2 source: Aider discord, t=0.75 |
+| MMLU-Pro Computer Science          | 83.7-85.6 | 82.9-84.6 | 81.5-82.4 | 85.1-85.3 | 84.6-86.1 |         |         |
 | EQ-Bench Longform Creative Writing | 76.4 |  ./. |    78.1 | 74.6 |    78.9 | EQ Bench version before August 8th, 2025 | see [EQ Bench](https://eqbench.com/creative_writing_longform.html)  |
 | Vectara Hallucination Rate         |  5.5 |  ./. |     8.0 | 14.3 |     7.7 | lower hallucination rates are better, R1T2 is better than all its three parents | see [Hallucination Leaderboard](https://github.com/vectara/hallucination-leaderboard) |