AI-MO
/

NuminaMath-7B-CoT

@@ -23,7 +23,7 @@ should probably proofread and complete it, then remove this comment. -->
 NuminaMath is a series of language models that are trained with two stages of supervised fine-tuning to solve math problems using chain of thought (CoT) and tool-integrated reasoning (TIR):
 * **Stage 1:** fine-tune the base model on a large, diverse dataset of natural language math problems and solutions, where each solution is templated with Chain of Thought (CoT) to facilitate reasoning.
-* **Stage 2:** fine-tune the model from Stage 1 on a synthetic dataset of tool-integrated reasoning, where each math problem is decomposed into a sequence of rationales, Python programs, and their outputs. Here we followed [Microsoft’s ToRA paper](https://arxiv.org/abs/2309.17452) and prompted GPT-4 to produce solutions in the ToRA format with code execution feedback. Fine-tuning on this data produces a reasoning agent that can solve mathematical problems via a mix of natural language reasoning and use of the Python REPL to compute intermediate results.
 NuminaMath 7B CoT is the model from Stage 1 and was fine-tuned on [AI-MO/NuminaMath-CoT](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT), a large-scale dataset of 860k+ math competition problem-solution pairs.
@@ -39,7 +39,6 @@ NuminaMath 7B CoT is the model from Stage 1 and was fine-tuned on [AI-MO/NuminaM
 <!-- Provide the basic links for the model. -->
 - **Repository:** https://github.com/project-numina/aimo-progress-prize
-- **Demo:** https://huggingface.co/spaces/AI-MO/math-olympiad-solver
 ## Intended uses & limitations
@@ -108,30 +107,14 @@ If you find NuminaMath 7B TIR is useful in your work, please cite it with:
 ```
 @misc{numina_math_7b,
   author = {Edward Beeching and Shengyi Costa Huang and Albert Jiang and Jia Li and Benjamin Lipkin and Zihan Qina and Kashif Rasul and Ziju Shen and Roman Soletskyi and Lewis Tunstall},
-  title = {NuminaMath 7B TIR},
   year = {2024},
   publisher = {Numina & Hugging Face},
   journal = {Hugging Face repository},
-  howpublished = {\url{https://huggingface.co/AI-MO/NuminaMath-7B-TIR}}
 }
 ```
-This model is a fine-tuned version of [deepseek-ai/deepseek-math-7b-base](https://huggingface.co/deepseek-ai/deepseek-math-7b-base) on the AI-MO/numina-dataset-v1.0-release-candidate-1-preproc dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.4859
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
@@ -150,14 +133,6 @@ The following hyperparameters were used during training:
 - lr_scheduler_warmup_ratio: 0.1
 - num_epochs: 3.0
-### Training results
-| Training Loss | Epoch | Step  | Validation Loss |
-|:-------------:|:-----:|:-----:|:---------------:|
-| 0.4814        | 1.0   | 6920  | 0.4942          |
-| 0.4188        | 2.0   | 13840 | 0.4728          |
-| 0.3496        | 3.0   | 20760 | 0.4859          |
 ### Framework versions

 NuminaMath is a series of language models that are trained with two stages of supervised fine-tuning to solve math problems using chain of thought (CoT) and tool-integrated reasoning (TIR):
 * **Stage 1:** fine-tune the base model on a large, diverse dataset of natural language math problems and solutions, where each solution is templated with Chain of Thought (CoT) to facilitate reasoning.
+* **Stage 2:** fine-tune the model from Stage 1 on a synthetic dataset of tool-integrated reasoning, where each math problem is decomposed into a sequence of rationales, Python programs, and their outputs.
 NuminaMath 7B CoT is the model from Stage 1 and was fine-tuned on [AI-MO/NuminaMath-CoT](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT), a large-scale dataset of 860k+ math competition problem-solution pairs.
 <!-- Provide the basic links for the model. -->
 - **Repository:** https://github.com/project-numina/aimo-progress-prize
 ## Intended uses & limitations
 ```
 @misc{numina_math_7b,
   author = {Edward Beeching and Shengyi Costa Huang and Albert Jiang and Jia Li and Benjamin Lipkin and Zihan Qina and Kashif Rasul and Ziju Shen and Roman Soletskyi and Lewis Tunstall},
+  title = {NuminaMath 7B CoT},
   year = {2024},
   publisher = {Numina & Hugging Face},
   journal = {Hugging Face repository},
+  howpublished = {\url{https://huggingface.co/AI-MO/NuminaMath-7B-CoT}}
 }
 ```
 ## Training procedure
 ### Training hyperparameters
 - lr_scheduler_warmup_ratio: 0.1
 - num_epochs: 3.0
 ### Framework versions