miromindai nielsr HF Staff commited on
Commit
ed0092e
·
verified ·
1 Parent(s): fffb4c7

Enhance model card with metadata and sample usage (#1)

Browse files

- Enhance model card with metadata and sample usage (511e455bf82b4762c53dabbd1a594367f93d2330)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +146 -4
README.md CHANGED
@@ -1,9 +1,15 @@
1
  ---
2
- license: apache-2.0
3
- language:
4
- - en
5
  base_model:
6
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
 
 
 
 
 
 
 
 
 
7
  ---
8
 
9
  <!-- markdownlint-disable first-line-h1 -->
@@ -24,7 +30,7 @@ base_model:
24
 
25
  </div>
26
 
27
-
28
 
29
  # MiroMind-M1
30
 
@@ -47,6 +53,7 @@ base_model:
47
  | OpenThoughts | Qwen2.5-7-Instruct | 31.3 | 23.3 | 83.2 |
48
  | Open-R1 | Qwen2.5-Math-7B-Instruct | 36.7 | 40.0 | 90.6 |
49
  | Synthetic-1 | Qwen2.5-7B-Instruct | 30.0 | 26.6 | 85.6 |
 
50
  | **MiroMind-SFT-7B** | Qwen2.5-Math-7B | 60.4 | 45.0 | 94.6 |
51
 
52
  *† means that the score of DeepSeek-R1 on AIME25 is from our evaluation.*
@@ -58,6 +65,7 @@ base_model:
58
  | DeepSeek-R1-0528 | 91.4 | 87.5 | – |
59
  | Qwen3-8B | 76.0 | 67.3 | – |
60
  | DeepSeek-R1-0528-Qwen3-8B | 86.0 | 76.3 | – |
 
61
  | <tr><td colspan="4" align="center"><em>**32B Models trained from Qwen2.5 series**</em></td></tr> |
62
  | DeepSeek-R1-Distill-Qwen-32B | 70.8 | 52.1 | 95.8 |
63
  | Skywork-OR1-32B-Preview | 77.1 | 68.2 | 97.5 |
@@ -79,3 +87,137 @@ base_model:
79
  ### Data
80
  [`MiroMind-M1-SFT-719K`](https://huggingface.co/datasets/miromind-ai/MiroMind-M1-SFT-719K)<br>
81
  [`MiroMind-M1-RL-62K`](https://huggingface.co/datasets/miromind-ai/MiroMind-M1-RL-62K)<br>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  base_model:
3
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
4
+ language:
5
+ - en
6
+ license: apache-2.0
7
+ pipeline_tag: text-generation
8
+ library_name: transformers
9
+ tags:
10
+ - mathematical-reasoning
11
+ - qwen
12
+ - causal-lm
13
  ---
14
 
15
  <!-- markdownlint-disable first-line-h1 -->
 
30
 
31
  </div>
32
 
33
+ This repository contains the MiroMind-M1-RL-32B model, part of the MiroMind-M1 series, described in the paper [MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization](https://huggingface.co/papers/2507.14683).
34
 
35
  # MiroMind-M1
36
 
 
53
  | OpenThoughts | Qwen2.5-7-Instruct | 31.3 | 23.3 | 83.2 |
54
  | Open-R1 | Qwen2.5-Math-7B-Instruct | 36.7 | 40.0 | 90.6 |
55
  | Synthetic-1 | Qwen2.5-7B-Instruct | 30.0 | 26.6 | 85.6 |
56
+ | MiMo-7B-SFT | MiMo-7B-Base | 58.7 | 44.3 | 93.0 |
57
  | **MiroMind-SFT-7B** | Qwen2.5-Math-7B | 60.4 | 45.0 | 94.6 |
58
 
59
  *† means that the score of DeepSeek-R1 on AIME25 is from our evaluation.*
 
65
  | DeepSeek-R1-0528 | 91.4 | 87.5 | – |
66
  | Qwen3-8B | 76.0 | 67.3 | – |
67
  | DeepSeek-R1-0528-Qwen3-8B | 86.0 | 76.3 | – |
68
+ | MiMo-7B-RL | 68.2 | 55.4 | 95.8 |
69
  | <tr><td colspan="4" align="center"><em>**32B Models trained from Qwen2.5 series**</em></td></tr> |
70
  | DeepSeek-R1-Distill-Qwen-32B | 70.8 | 52.1 | 95.8 |
71
  | Skywork-OR1-32B-Preview | 77.1 | 68.2 | 97.5 |
 
87
  ### Data
88
  [`MiroMind-M1-SFT-719K`](https://huggingface.co/datasets/miromind-ai/MiroMind-M1-SFT-719K)<br>
89
  [`MiroMind-M1-RL-62K`](https://huggingface.co/datasets/miromind-ai/MiroMind-M1-RL-62K)<br>
90
+
91
+ ## 🚀 Quickstart
92
+
93
+ You can explore the models using the Transformers library.
94
+
95
+ ```python
96
+ from transformers import AutoTokenizer, AutoModelForCausalLM
97
+ import torch
98
+
99
+ model_name = "miromind-ai/MiroMind-M1-RL-32B" # Or miromind-ai/MiroMind-M1-RL-7B
100
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
101
+ model = AutoModelForCausalLM.from_pretrained(
102
+ model_name,
103
+ torch_dtype=torch.bfloat16,
104
+ device_map="auto",
105
+ trust_remote_code=True
106
+ )
107
+
108
+ prompt = "Given the equation $2x + 5 = 11$, what is the value of $x$?"
109
+ messages = [
110
+ {"role": "user", "content": prompt}
111
+ ]
112
+ text = tokenizer.apply_chat_template(
113
+ messages,
114
+ tokenize=False,
115
+ add_generation_prompt=True
116
+ )
117
+
118
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
119
+
120
+ generated_ids = model.generate(
121
+ model_inputs.input_ids,
122
+ max_new_tokens=512
123
+ )
124
+ generated_ids = [
125
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
126
+ ]
127
+
128
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
129
+ print(response)
130
+ ```
131
+
132
+ ## 🛠 Getting Started
133
+
134
+ ### Installation
135
+
136
+ venv environment:
137
+
138
+ ```bash
139
+ git clone https://github.com/MiroMindAsia/MiroMind-M1.git
140
+ cd MiroMind-M1
141
+
142
+ # Install Python 3.10 environment.
143
+ python3.10 -m pip install virtualenv
144
+ virtualenv -p python3.10 venv
145
+ source venv/bin/activate
146
+
147
+ # Install dependencies.
148
+ pip3 install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
149
+ pip3 install numpy psutil ninja packaging cmake
150
+ pip3 install flash_attn==2.7.4.post1 --no-build-isolation # This may take a while...
151
+ pip3 install -e .
152
+ ```
153
+
154
+ ## 🏋️ Training
155
+
156
+ ### Multi-Node Training
157
+
158
+ Here is a quik guided to start Ray for multi-node training.
159
+
160
+ #### On the head node
161
+ ```bash
162
+ ray stop
163
+ ray start --head --node-ip-address $HEAD_NODE_IP --num-gpus 8 --dashboard-host=0.0.0.0
164
+ ```
165
+
166
+ #### On other nodes
167
+ ```bash
168
+ ray stop
169
+ ray start --address="$HEAD_NODE_IP:6379" --num-gpus 8
170
+ ```
171
+
172
+ ### Start Training
173
+
174
+ First, please provde the below variables:
175
+
176
+ ```bash
177
+ export MODEL_PATH=YOUR_MODEL_PATH
178
+ export CKPTS_DIR=YOUR_CKPTS_DIR
179
+ export TRAIN_FILE=YOUR_TRAIN_FILE
180
+ export TEST_FILE=YOUR_TEST_FILE
181
+ export HOME=YOUR_HOME_PATH
182
+ ```
183
+
184
+ Then run the below script to start the training:
185
+
186
+ ```bash
187
+ bash m1_train_script/campo_32b.sh
188
+ ```
189
+
190
+ ## ⚖️ Run Evaluation
191
+
192
+ We provide ready-to-use evaluation scripts in the `m1_eval_script/` directory for mathematical reasoning benchmarks.
193
+
194
+ ### Quick Start
195
+
196
+ ```bash
197
+ # Evaluate on AIME 2024
198
+ bash m1_eval_script/evaluate_7b_aime24.sh
199
+
200
+ # Evaluate on AIME 2025
201
+ bash m1_eval_script/evaluate_7b_aime25.sh
202
+
203
+ # Evaluate on Math-500
204
+ bash m1_eval_script/evaluate_7b_math500.sh
205
+ ```
206
+
207
+ ### Supported Benchmarks
208
+
209
+ | Dataset | Script | Standard Runs |
210
+ |---------|--------|---------------|
211
+ | **AIME 2024** | `evaluate_7b_aime24.sh` | 64 runs |
212
+ | **AIME 2025** | `evaluate_7b_aime25.sh` | 64 runs |
213
+ | **Math-500** | `evaluate_7b_math500.sh` | 5 runs |
214
+
215
+ ### Results
216
+
217
+ Results are saved in `results/[model_name]/[dataset_name]/` with:
218
+ - `average_accuracy.txt`: Final accuracy score
219
+ - `run[X]_inference_eval_results.csv`: Detailed results
220
+
221
+ ## 🙏 Acknowledgement
222
+
223
+ The RL trianing is built from the wonderful [`verl`](https://github.com/volcengine/verl) project.