miromind-ai
/

MiroMind-M1-RL-32B

Text Generation

Transformers

Safetensors

English

qwen2

mathematical-reasoning

qwen

causal-lm

conversational

text-generation-inference

Model card Files Files and versions Community

miromindai

nielsr HF Staff commited on 15 days ago

Commit

ed0092e

verified ·

1 Parent(s): fffb4c7

Enhance model card with metadata and sample usage (#1)

Browse files

- Enhance model card with metadata and sample usage (511e455bf82b4762c53dabbd1a594367f93d2330)

Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show

README.md +146 -4

README.md CHANGED Viewed

@@ -1,9 +1,15 @@
 ---
-license: apache-2.0
-language:
-- en
 base_model:
 - deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
 ---
 <!-- markdownlint-disable first-line-h1 -->
@@ -24,7 +30,7 @@ base_model:
 </div>
 # MiroMind-M1
@@ -47,6 +53,7 @@ base_model:
 | OpenThoughts                         | Qwen2.5-7-Instruct           | 31.3   | 23.3   | 83.2    |
 | Open-R1                              | Qwen2.5-Math-7B-Instruct     | 36.7   | 40.0   | 90.6    |
 | Synthetic-1                          | Qwen2.5-7B-Instruct          | 30.0   | 26.6   | 85.6    |
 | **MiroMind-SFT-7B**                  | Qwen2.5-Math-7B             | 60.4   | 45.0   | 94.6    |
 *† means that the score of DeepSeek-R1 on AIME25 is from our evaluation.*
@@ -58,6 +65,7 @@ base_model:
 | DeepSeek-R1-0528                 | 91.4   | 87.5   | –       |
 | Qwen3-8B                         | 76.0   | 67.3   | –       |
 | DeepSeek-R1-0528-Qwen3-8B        | 86.0   | 76.3   | –       |
 | <tr><td colspan="4" align="center"><em>**32B Models trained from Qwen2.5 series**</em></td></tr> |
 | DeepSeek-R1-Distill-Qwen-32B     | 70.8   | 52.1   | 95.8    |
 | Skywork-OR1-32B-Preview          | 77.1   | 68.2   | 97.5    |
@@ -79,3 +87,137 @@ base_model:
 ### Data
 [`MiroMind-M1-SFT-719K`](https://huggingface.co/datasets/miromind-ai/MiroMind-M1-SFT-719K)<br>
 [`MiroMind-M1-RL-62K`](https://huggingface.co/datasets/miromind-ai/MiroMind-M1-RL-62K)<br>

 ---
 base_model:
 - deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
+language:
+- en
+license: apache-2.0
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- mathematical-reasoning
+- qwen
+- causal-lm
 ---
 <!-- markdownlint-disable first-line-h1 -->
 </div>
+This repository contains the MiroMind-M1-RL-32B model, part of the MiroMind-M1 series, described in the paper [MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization](https://huggingface.co/papers/2507.14683).
 # MiroMind-M1
 | OpenThoughts                         | Qwen2.5-7-Instruct           | 31.3   | 23.3   | 83.2    |
 | Open-R1                              | Qwen2.5-Math-7B-Instruct     | 36.7   | 40.0   | 90.6    |
 | Synthetic-1                          | Qwen2.5-7B-Instruct          | 30.0   | 26.6   | 85.6    |
+| MiMo-7B-SFT                          | MiMo-7B-Base          | 58.7   | 44.3   | 93.0    |
 | **MiroMind-SFT-7B**                  | Qwen2.5-Math-7B             | 60.4   | 45.0   | 94.6    |
 *† means that the score of DeepSeek-R1 on AIME25 is from our evaluation.*
 | DeepSeek-R1-0528                 | 91.4   | 87.5   | –       |
 | Qwen3-8B                         | 76.0   | 67.3   | –       |
 | DeepSeek-R1-0528-Qwen3-8B        | 86.0   | 76.3   | –       |
+| MiMo-7B-RL                       | 68.2   | 55.4   | 95.8    |
 | <tr><td colspan="4" align="center"><em>**32B Models trained from Qwen2.5 series**</em></td></tr> |
 | DeepSeek-R1-Distill-Qwen-32B     | 70.8   | 52.1   | 95.8    |
 | Skywork-OR1-32B-Preview          | 77.1   | 68.2   | 97.5    |
 ### Data
 [`MiroMind-M1-SFT-719K`](https://huggingface.co/datasets/miromind-ai/MiroMind-M1-SFT-719K)<br>
 [`MiroMind-M1-RL-62K`](https://huggingface.co/datasets/miromind-ai/MiroMind-M1-RL-62K)<br>
+## 🚀 Quickstart
+You can explore the models using the Transformers library.
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+model_name = "miromind-ai/MiroMind-M1-RL-32B" # Or miromind-ai/MiroMind-M1-RL-7B
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    trust_remote_code=True
+)
+prompt = "Given the equation $2x + 5 = 11$, what is the value of $x$?"
+messages = [
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+generated_ids = model.generate(
+    model_inputs.input_ids,
+    max_new_tokens=512
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+print(response)
+```
+## 🛠 Getting Started
+### Installation
+venv environment:
+```bash
+git clone https://github.com/MiroMindAsia/MiroMind-M1.git
+cd MiroMind-M1
+# Install Python 3.10 environment.
+python3.10 -m pip install virtualenv
+virtualenv -p python3.10 venv
+source venv/bin/activate
+# Install dependencies.
+pip3 install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
+pip3 install numpy psutil ninja packaging cmake
+pip3 install flash_attn==2.7.4.post1 --no-build-isolation # This may take a while...
+pip3 install -e .
+```
+## 🏋️ Training
+### Multi-Node Training
+Here is a quik guided to start Ray for multi-node training.
+#### On the head node
+```bash
+ray stop
+ray start --head --node-ip-address $HEAD_NODE_IP --num-gpus 8 --dashboard-host=0.0.0.0
+```
+#### On other nodes
+```bash
+ray stop
+ray start --address="$HEAD_NODE_IP:6379" --num-gpus 8
+```
+### Start Training
+First, please provde the below variables:
+```bash
+export MODEL_PATH=YOUR_MODEL_PATH
+export CKPTS_DIR=YOUR_CKPTS_DIR
+export TRAIN_FILE=YOUR_TRAIN_FILE
+export TEST_FILE=YOUR_TEST_FILE
+export HOME=YOUR_HOME_PATH
+```
+Then run the below script to start the training:
+```bash
+bash m1_train_script/campo_32b.sh
+```
+## ⚖️ Run Evaluation
+We provide ready-to-use evaluation scripts in the `m1_eval_script/` directory for mathematical reasoning benchmarks.
+### Quick Start
+```bash
+# Evaluate on AIME 2024
+bash m1_eval_script/evaluate_7b_aime24.sh
+# Evaluate on AIME 2025
+bash m1_eval_script/evaluate_7b_aime25.sh
+# Evaluate on Math-500
+bash m1_eval_script/evaluate_7b_math500.sh
+```
+### Supported Benchmarks
+| Dataset | Script | Standard Runs |
+|---------|--------|---------------|
+| **AIME 2024** | `evaluate_7b_aime24.sh` | 64 runs |
+| **AIME 2025** | `evaluate_7b_aime25.sh` | 64 runs |
+| **Math-500** | `evaluate_7b_math500.sh` | 5 runs |
+### Results
+Results are saved in `results/[model_name]/[dataset_name]/` with:
+- `average_accuracy.txt`: Final accuracy score
+- `run[X]_inference_eval_results.csv`: Detailed results
+## 🙏 Acknowledgement
+The RL trianing is built from the wonderful [`verl`](https://github.com/volcengine/verl) project.