Update README.md
Browse files
README.md
CHANGED
|
@@ -17,31 +17,30 @@ The AceMath-1.5B/7B/72B-Instruct models excel at solving English mathematical pr
|
|
| 17 |
|
| 18 |
The AceMath-1.5B/7B/72B-Instruct models are developed from the Qwen2.5-Math-1.5B/7B/72B-Base models, leveraging a multi-stage supervised fine-tuning (SFT) process: first with general-purpose SFT data, followed by math-specific SFT data. We are releasing all training data to support further research in this field.
|
| 19 |
|
|
|
|
|
|
|
| 20 |
For more information about AceMath, check our [website](https://research.nvidia.com/labs/adlr/acemath/) and [paper](https://arxiv.org/abs/2412.15084).
|
| 21 |
|
|
|
|
| 22 |
## All Resources
|
| 23 |
[AceMath-1.5B-Instruct](https://huggingface.co/nvidia/AceMath-1.5B-Instruct)   [AceMath-7B-Instruct](https://huggingface.co/nvidia/AceMath-7B-Instruct)   [AceMath-72B-Instruct](https://huggingface.co/nvidia/AceMath-72B-Instruct)
|
| 24 |
|
| 25 |
[AceMath-7B-RM](https://huggingface.co/nvidia/AceMath-7B-RM)   [AceMath-72B-RM](https://huggingface.co/nvidia/AceMath-72B-RM)
|
| 26 |
|
| 27 |
-
[AceMath-Instruct Training Data](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data)   [AceMath-RM Training Data](https://huggingface.co/datasets/nvidia/AceMath-RM-Training-Data)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
-
[AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench)   [AceMath Evaluation Script](https://huggingface.co/datasets/nvidia/AceMath-RewardBench/tree/main/scripts)
|
| 30 |
|
| 31 |
## Benchmark Results
|
| 32 |
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
| MATH | 81.10 | 75.90 | 73.80 | 75.80 | 83.60 | 85.90 | 76.84 | 83.14 | 86.10 |
|
| 37 |
-
| Minerva Math | 50.74 | 48.16 | 54.04 | 29.40 | 37.10 | 44.10 | 41.54 | 51.11 | 56.99 |
|
| 38 |
-
| GaoKao 2023En | 67.50 | 64.94 | 62.08 | 65.50 | 66.80 | 71.90 | 64.42 | 68.05 | 72.21 |
|
| 39 |
-
| Olympiad Bench | 43.30 | 37.93 | 34.81 | 38.10 | 41.60 | 49.00 | 33.78 | 42.22 | 48.44 |
|
| 40 |
-
| College Math | 48.50 | 48.47 | 49.25 | 47.70 | 46.80 | 49.50 | 54.36 | 56.64 | 57.24 |
|
| 41 |
-
| MMLU STEM | 87.99 | 85.06 | 83.10 | 57.50 | 71.90 | 80.80 | 62.04 | 75.32 | 85.44 |
|
| 42 |
-
| Average | 67.43 | 65.27 | 64.84 | 56.97 | 63.29 | 68.16 | 59.99 | 67.17 | 71.84 |
|
| 43 |
|
| 44 |
-
|
| 45 |
|
| 46 |
|
| 47 |
## How to use
|
|
@@ -91,5 +90,4 @@ If you find our work helpful, we’d appreciate it if you could cite us.
|
|
| 91 |
|
| 92 |
|
| 93 |
## License
|
| 94 |
-
All models in the AceMath family are for non-commercial use only, subject to [Terms of Use](https://openai.com/policies/row-terms-of-use/) of the data generated by OpenAI. We put the AceMath models under the license of [Creative Commons Attribution: Non-Commercial 4.0 International](https://spdx.org/licenses/CC-BY-NC-4.0).
|
| 95 |
-
|
|
|
|
| 17 |
|
| 18 |
The AceMath-1.5B/7B/72B-Instruct models are developed from the Qwen2.5-Math-1.5B/7B/72B-Base models, leveraging a multi-stage supervised fine-tuning (SFT) process: first with general-purpose SFT data, followed by math-specific SFT data. We are releasing all training data to support further research in this field.
|
| 19 |
|
| 20 |
+
We only recommend using the AceMath models for solving math problems. To support other tasks, we also release AceInstruct-1.5B/7B/72B, a series of general-purpose SFT models designed to handle code, math, and general knowledge tasks. These models are built upon the Qwen2.5-1.5B/7B/72B-Base.
|
| 21 |
+
|
| 22 |
For more information about AceMath, check our [website](https://research.nvidia.com/labs/adlr/acemath/) and [paper](https://arxiv.org/abs/2412.15084).
|
| 23 |
|
| 24 |
+
|
| 25 |
## All Resources
|
| 26 |
[AceMath-1.5B-Instruct](https://huggingface.co/nvidia/AceMath-1.5B-Instruct)   [AceMath-7B-Instruct](https://huggingface.co/nvidia/AceMath-7B-Instruct)   [AceMath-72B-Instruct](https://huggingface.co/nvidia/AceMath-72B-Instruct)
|
| 27 |
|
| 28 |
[AceMath-7B-RM](https://huggingface.co/nvidia/AceMath-7B-RM)   [AceMath-72B-RM](https://huggingface.co/nvidia/AceMath-72B-RM)
|
| 29 |
|
| 30 |
+
[AceMath-Instruct Training Data](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data)   [AceMath-RM Training Data](https://huggingface.co/datasets/nvidia/AceMath-RM-Training-Data)
|
| 31 |
+
|
| 32 |
+
[AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench)   [AceMath-Instruct Evaluation Script](https://huggingface.co/datasets/nvidia/AceMath-Evaluation-Script)
|
| 33 |
+
|
| 34 |
+
[AceInstruct-1.5B](https://huggingface.co/nvidia/AceInstruct-1.5B)   [AceInstruct-7B](https://huggingface.co/nvidia/AceInstruct-7B)   [AceInstruct-72B](https://huggingface.co/nvidia/AceInstruct-72B)
|
| 35 |
|
|
|
|
| 36 |
|
| 37 |
## Benchmark Results
|
| 38 |
|
| 39 |
+
<p align="center">
|
| 40 |
+
<img src="https://research.nvidia.com/labs/adlr/images/acemath/acemath.png" alt="AceMath Benchmark Results" width="800">
|
| 41 |
+
</p>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
+
We compare AceMath to leading proprietary and open-access math models in above Table. Our AceMath-7B-Instruct, largely outperforms the previous best-in-class Qwen2.5-Math-7B-Instruct (Average pass@1: 67.2 vs. 62.9) on a variety of math reasoning benchmarks, while coming close to the performance of 10× larger Qwen2.5-Math-72B-Instruct (67.2 vs. 68.2). Notably, our AceMath-72B-Instruct outperforms the state-of-the-art Qwen2.5-Math-72B-Instruct (71.8 vs. 68.2), GPT-4o (67.4) and Claude 3.5 Sonnet (65.6) by a margin. We also report the rm@8 accuracy (best of 8) achieved by our reward model, AceMath-72B-RM, which sets a new record on these reasoning benchmarks. This excludes OpenAI’s o1 model, which relies on scaled inference computation.
|
| 44 |
|
| 45 |
|
| 46 |
## How to use
|
|
|
|
| 90 |
|
| 91 |
|
| 92 |
## License
|
| 93 |
+
All models in the AceMath family are for non-commercial use only, subject to [Terms of Use](https://openai.com/policies/row-terms-of-use/) of the data generated by OpenAI. We put the AceMath models under the license of [Creative Commons Attribution: Non-Commercial 4.0 International](https://spdx.org/licenses/CC-BY-NC-4.0).
|
|
|