nvidia
/

AceMath-72B-Instruct

@@ -17,31 +17,30 @@ The AceMath-1.5B/7B/72B-Instruct models excel at solving English mathematical pr
 The AceMath-1.5B/7B/72B-Instruct models are developed from the Qwen2.5-Math-1.5B/7B/72B-Base models, leveraging a multi-stage supervised fine-tuning (SFT) process: first with general-purpose SFT data, followed by math-specific SFT data. We are releasing all training data to support further research in this field.
 For more information about AceMath, check our [website](https://research.nvidia.com/labs/adlr/acemath/) and [paper](https://arxiv.org/abs/2412.15084).
 ## All Resources
 [AceMath-1.5B-Instruct](https://huggingface.co/nvidia/AceMath-1.5B-Instruct) &ensp; [AceMath-7B-Instruct](https://huggingface.co/nvidia/AceMath-7B-Instruct) &ensp; [AceMath-72B-Instruct](https://huggingface.co/nvidia/AceMath-72B-Instruct)
 [AceMath-7B-RM](https://huggingface.co/nvidia/AceMath-7B-RM) &ensp; [AceMath-72B-RM](https://huggingface.co/nvidia/AceMath-72B-RM)
-[AceMath-Instruct Training Data](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data) &ensp; [AceMath-RM Training Data](https://huggingface.co/datasets/nvidia/AceMath-RM-Training-Data)
-[AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench) &ensp; [AceMath Evaluation Script](https://huggingface.co/datasets/nvidia/AceMath-RewardBench/tree/main/scripts)
 ## Benchmark Results
-| | GPT-4o (2024-0806) | Claude-3.5 Sonnet (2024-1022) | Llama3.1-405B-Instruct | Qwen2.5-Math-1.5B-Instruct | Qwen2.5-Math-7B-Instruct | Qwen2.5-Math-72B-Instruct | AceMath-1.5B-Instruct | AceMath-7B-Instruct | AceMath-72B-Instruct |
-| -------------- |:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|
-| GSM8K          | 92.90 | 96.40 | 96.80 | 84.80 | 95.20 | 95.90 | 86.95 | 93.71 | 96.44 |
-| MATH           | 81.10 | 75.90 | 73.80 | 75.80 | 83.60 | 85.90 | 76.84 | 83.14 | 86.10 |
-| Minerva Math   | 50.74 | 48.16 | 54.04 | 29.40 | 37.10 | 44.10 | 41.54 | 51.11 | 56.99 |
-| GaoKao 2023En  | 67.50 | 64.94 | 62.08 | 65.50 | 66.80 | 71.90 | 64.42 | 68.05 | 72.21 |
-| Olympiad Bench | 43.30 | 37.93 | 34.81 | 38.10 | 41.60 | 49.00 | 33.78 | 42.22 | 48.44 |
-| College Math   | 48.50 | 48.47 | 49.25 | 47.70 | 46.80 | 49.50 | 54.36 | 56.64 | 57.24 |
-| MMLU STEM      | 87.99 | 85.06 | 83.10 | 57.50 | 71.90 | 80.80 | 62.04 | 75.32 | 85.44 |
-| Average        | 67.43 | 65.27 | 64.84 | 56.97 | 63.29 | 68.16 | 59.99 | 67.17 | 71.84 |
-Greedy decoding (pass@1) results on a variety of math reasoning benchmarks. AceMath-7B-Instruct significantly outperforms the previous best-in-class Qwen2.5-Math-7B-Instruct (67.2 vs. 62.9) and comes close to the performance of 10× larger Qwen2.5-Math-72B-Instruct (67.2 vs. 68.2). Notably, our AceMath-72B-Instruct outperforms the state-of-the-art Qwen2.5-Math-72B-Instruct (71.8 vs. 68.2), GPT-4o (67.4) and Claude 3.5 Sonnet (65.6) by a margin.
 ## How to use
@@ -91,5 +90,4 @@ If you find our work helpful, we’d appreciate it if you could cite us.
 ## License
-All models in the AceMath family are for non-commercial use only, subject to [Terms of Use](https://openai.com/policies/row-terms-of-use/) of the data generated by OpenAI. We put the AceMath models under the license of [Creative Commons Attribution: Non-Commercial 4.0 International](https://spdx.org/licenses/CC-BY-NC-4.0).

 The AceMath-1.5B/7B/72B-Instruct models are developed from the Qwen2.5-Math-1.5B/7B/72B-Base models, leveraging a multi-stage supervised fine-tuning (SFT) process: first with general-purpose SFT data, followed by math-specific SFT data. We are releasing all training data to support further research in this field.
+We only recommend using the AceMath models for solving math problems. To support other tasks, we also release AceInstruct-1.5B/7B/72B, a series of general-purpose SFT models designed to handle code, math, and general knowledge tasks. These models are built upon the Qwen2.5-1.5B/7B/72B-Base.
 For more information about AceMath, check our [website](https://research.nvidia.com/labs/adlr/acemath/) and [paper](https://arxiv.org/abs/2412.15084).
 ## All Resources
 [AceMath-1.5B-Instruct](https://huggingface.co/nvidia/AceMath-1.5B-Instruct) &ensp; [AceMath-7B-Instruct](https://huggingface.co/nvidia/AceMath-7B-Instruct) &ensp; [AceMath-72B-Instruct](https://huggingface.co/nvidia/AceMath-72B-Instruct)
 [AceMath-7B-RM](https://huggingface.co/nvidia/AceMath-7B-RM) &ensp; [AceMath-72B-RM](https://huggingface.co/nvidia/AceMath-72B-RM)
+[AceMath-Instruct Training Data](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data) &ensp; [AceMath-RM Training Data](https://huggingface.co/datasets/nvidia/AceMath-RM-Training-Data)
+[AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench) &ensp; [AceMath-Instruct Evaluation Script](https://huggingface.co/datasets/nvidia/AceMath-Evaluation-Script)
+[AceInstruct-1.5B](https://huggingface.co/nvidia/AceInstruct-1.5B) &ensp; [AceInstruct-7B](https://huggingface.co/nvidia/AceInstruct-7B) &ensp; [AceInstruct-72B](https://huggingface.co/nvidia/AceInstruct-72B)
 ## Benchmark Results
+<p align="center">
+  <img src="https://research.nvidia.com/labs/adlr/images/acemath/acemath.png" alt="AceMath Benchmark Results" width="800">
+</p>
+We compare AceMath to leading proprietary and open-access math models in above Table. Our AceMath-7B-Instruct, largely outperforms the previous best-in-class Qwen2.5-Math-7B-Instruct (Average pass@1: 67.2 vs. 62.9) on a variety of math reasoning benchmarks, while coming close to the performance of 10× larger Qwen2.5-Math-72B-Instruct (67.2 vs. 68.2). Notably, our AceMath-72B-Instruct outperforms the state-of-the-art Qwen2.5-Math-72B-Instruct (71.8 vs. 68.2), GPT-4o (67.4) and Claude 3.5 Sonnet (65.6) by a margin. We also report the rm@8 accuracy (best of 8) achieved by our reward model, AceMath-72B-RM, which sets a new record on these reasoning benchmarks. This excludes OpenAI’s o1 model, which relies on scaled inference computation.
 ## How to use
 ## License
+All models in the AceMath family are for non-commercial use only, subject to [Terms of Use](https://openai.com/policies/row-terms-of-use/) of the data generated by OpenAI. We put the AceMath models under the license of [Creative Commons Attribution: Non-Commercial 4.0 International](https://spdx.org/licenses/CC-BY-NC-4.0).