tc-mb commited on
Commit
f174180
·
0 Parent(s):

Initial commit: MiniCPM-V-4-gguf model

Browse files
.DS_Store ADDED
Binary file (6.15 kB). View file
 
.gitattributes ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ Model-3.6B-F16.gguf filter=lfs diff=lfs merge=lfs -text
37
+ ggml-model-Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
38
+ ggml-model-Q4_1.gguf filter=lfs diff=lfs merge=lfs -text
39
+ ggml-model-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
40
+ ggml-model-Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
41
+ ggml-model-Q5_0.gguf filter=lfs diff=lfs merge=lfs -text
42
+ ggml-model-Q5_1.gguf filter=lfs diff=lfs merge=lfs -text
43
+ ggml-model-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
44
+ ggml-model-Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
45
+ ggml-model-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
46
+ ggml-model-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
47
+ mmproj-model-f16.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,649 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: image-text-to-text
3
+ datasets:
4
+ - openbmb/RLAIF-V-Dataset
5
+ library_name: transformers
6
+ language:
7
+ - multilingual
8
+ tags:
9
+ - minicpm-v
10
+ - vision
11
+ - ocr
12
+ - multi-image
13
+ - video
14
+ - custom_code
15
+ ---
16
+
17
+ <h1>A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone</h1>
18
+
19
+ [GitHub](https://github.com/OpenBMB/MiniCPM-o) | [Demo](https://minicpm-v.openbmb.cn/)</a>
20
+
21
+
22
+
23
+ ## MiniCPM-V 4.0
24
+
25
+ **MiniCPM-V 4.0** is the latest model in the MiniCPM-V series. The model is built in an end-to-end fasion based on SigLip2-400M and MiniCPM4-3B with a total of 4.1B parameters. It inherents the strong single-image, multi-image and video understanding performance of MiniCPM-V 2.6 with the efficiency largely improved. Notable features of MiniCPM-V 4.0 include:
26
+
27
+ - 🔥 **Leading Visual Capability.**
28
+ MiniCPM-V 4.0 achieves an average score of 69.0 on OpenCompass, a comprehensive evaluation of 8 popular benchmarks, outperforming both MiniCPM-V 2.6 (8.1B, 65.2) and Qwen2.5-VL-3B-Instruct (3.8B, 64.5). **With only 4.1B parameters, it surpasses the widely used proprietary model GPT-4.1-mini-20250414** for single image understanding. It also outperforms MiniCPM-V 2.6 on both multi-image understanding and video understanding.
29
+
30
+ - 🚀 **Superior Efficiency.**
31
+ Designed for performance on end devices, MiniCPM-V 4.0 runs smoothly on the **iPhone 16 Pro Max, devlivering 17.9 tokens/second decoding speed**. Compared to the already efficient MiniCPM-V 2.6, MiniCPM-V 4.0 further achieves a 30% throughput boost while offering enhanced visual understanding.
32
+
33
+ - 💫 **Easy Usage.**
34
+ MiniCPM-V 4.0 can be easily used in various ways including **llama.cpp, Ollama, vLLM, SGLang, LLaMA-Factory and local web demo** etc. Get started easily with our **well-structured [Cookbook](https://github.com/OpenSQZ/MiniCPM-V-CookBook)**, featuring detailed instructions and practical examples.
35
+
36
+
37
+ ### Evaluation
38
+
39
+ <details>
40
+ <summary>Click to view single image results on OpenCompass. </summary>
41
+ <div align="center">
42
+ <table style="margin: 0px auto;">
43
+ <thead>
44
+ <tr>
45
+ <th nowrap="nowrap" align="left">model</th>
46
+ <th>Size</th>
47
+ <th>Opencompass</th>
48
+ <th>OCRBench</th>
49
+ <th>MathVista</th>
50
+ <th>HallusionBench</th>
51
+ <th>MMMU</th>
52
+ <th>MMVet</th>
53
+ <th>MMBench V1.1</th>
54
+ <th>MMStar</th>
55
+ <th>AI2D</th>
56
+ </tr>
57
+ </thead>
58
+ <tbody align="center">
59
+ <tr>
60
+ <td colspan="11" align="left"><strong>Proprietary</strong></td>
61
+ </tr>
62
+ <tr>
63
+ <td nowrap="nowrap" align="left">GPT-4v-20240409</td>
64
+ <td>-</td>
65
+ <td>63.5</td>
66
+ <td>656</td>
67
+ <td>55.2</td>
68
+ <td>43.9</td>
69
+ <td>61.7</td>
70
+ <td>67.5</td>
71
+ <td>79.8</td>
72
+ <td>56.0</td>
73
+ <td>78.6</td>
74
+ </tr>
75
+ <tr>
76
+ <td nowrap="nowrap" align="left">Gemini-1.5-Pro</td>
77
+ <td>-</td>
78
+ <td>64.5</td>
79
+ <td>754</td>
80
+ <td>58.3</td>
81
+ <td>45.6</td>
82
+ <td>60.6</td>
83
+ <td>64.0</td>
84
+ <td>73.9</td>
85
+ <td>59.1</td>
86
+ <td>79.1</td>
87
+ </tr>
88
+ <tr>
89
+ <td nowrap="nowrap" align="left">GPT-4.1-mini-20250414</td>
90
+ <td>-</td>
91
+ <td>68.9</td>
92
+ <td>840</td>
93
+ <td>70.9</td>
94
+ <td>49.3</td>
95
+ <td>55.0</td>
96
+ <td>74.3</td>
97
+ <td>80.9</td>
98
+ <td>60.9</td>
99
+ <td>76.0</td>
100
+ </tr>
101
+ <tr>
102
+ <td nowrap="nowrap" align="left">Claude 3.5 Sonnet-20241022</td>
103
+ <td>-</td>
104
+ <td>70.6</td>
105
+ <td>798</td>
106
+ <td>65.3</td>
107
+ <td>55.5</td>
108
+ <td>66.4</td>
109
+ <td>70.1</td>
110
+ <td>81.7</td>
111
+ <td>65.1</td>
112
+ <td>81.2</td>
113
+ </tr>
114
+ <tr>
115
+ <td colspan="11" align="left"><strong>Open-source</strong></td>
116
+ </tr>
117
+ <tr>
118
+ <td nowrap="nowrap" align="left">Qwen2.5-VL-3B-Instruct</td>
119
+ <td>3.8B</td>
120
+ <td>64.5</td>
121
+ <td>828</td>
122
+ <td>61.2</td>
123
+ <td>46.6</td>
124
+ <td>51.2</td>
125
+ <td>60.0</td>
126
+ <td>76.8</td>
127
+ <td>56.3</td>
128
+ <td>81.4</td>
129
+ </tr>
130
+ <tr>
131
+ <td nowrap="nowrap" align="left">InternVL2.5-4B</td>
132
+ <td>3.7B</td>
133
+ <td>65.1</td>
134
+ <td>820</td>
135
+ <td>60.8</td>
136
+ <td>46.6</td>
137
+ <td>51.8</td>
138
+ <td>61.5</td>
139
+ <td>78.2</td>
140
+ <td>58.7</td>
141
+ <td>81.4</td>
142
+ </tr>
143
+ <tr>
144
+ <td nowrap="nowrap" align="left">Qwen2.5-VL-7B-Instruct</td>
145
+ <td>8.3B</td>
146
+ <td>70.9</td>
147
+ <td>888</td>
148
+ <td>68.1</td>
149
+ <td>51.9</td>
150
+ <td>58.0</td>
151
+ <td>69.7</td>
152
+ <td>82.2</td>
153
+ <td>64.1</td>
154
+ <td>84.3</td>
155
+ </tr>
156
+ <tr>
157
+ <td nowrap="nowrap" align="left">InternVL2.5-8B</td>
158
+ <td>8.1B</td>
159
+ <td>68.1</td>
160
+ <td>821</td>
161
+ <td>64.5</td>
162
+ <td>49.0</td>
163
+ <td>56.2</td>
164
+ <td>62.8</td>
165
+ <td>82.5</td>
166
+ <td>63.2</td>
167
+ <td>84.6</td>
168
+ </tr>
169
+ <tr>
170
+ <td nowrap="nowrap" align="left">MiniCPM-V-2.6</td>
171
+ <td>8.1B</td>
172
+ <td>65.2</td>
173
+ <td>852</td>
174
+ <td>60.8</td>
175
+ <td>48.1</td>
176
+ <td>49.8</td>
177
+ <td>60.0</td>
178
+ <td>78.0</td>
179
+ <td>57.5</td>
180
+ <td>82.1</td>
181
+ </tr>
182
+ <tr>
183
+ <td nowrap="nowrap" align="left">MiniCPM-o-2.6</td>
184
+ <td>8.7B</td>
185
+ <td>70.2</td>
186
+ <td>889</td>
187
+ <td>73.3</td>
188
+ <td>51.1</td>
189
+ <td>50.9</td>
190
+ <td>67.2</td>
191
+ <td>80.6</td>
192
+ <td>63.3</td>
193
+ <td>86.1</td>
194
+ </tr>
195
+ <tr>
196
+ <td nowrap="nowrap" align="left">MiniCPM-V-4.0</td>
197
+ <td>4.1B</td>
198
+ <td>69.0</td>
199
+ <td>894</td>
200
+ <td>66.9</td>
201
+ <td>50.8</td>
202
+ <td>51.2</td>
203
+ <td>68.0</td>
204
+ <td>79.7</td>
205
+ <td>62.8</td>
206
+ <td>82.9</td>
207
+ </tr>
208
+ </tbody>
209
+ </table>
210
+ </div>
211
+
212
+ </details>
213
+
214
+ <details>
215
+ <summary>Click to view single image results on ChartQA, MME, RealWorldQA, TextVQA, DocVQA, MathVision, DynaMath, WeMath, Object HalBench and MM Halbench. </summary>
216
+
217
+ <div align="center">
218
+ <table style="margin: 0px auto;">
219
+ <thead>
220
+ <tr>
221
+ <th nowrap="nowrap" align="left">model</th>
222
+ <th>Size</th>
223
+ <th>ChartQA</th>
224
+ <th>MME</th>
225
+ <th>RealWorldQA</th>
226
+ <th>TextVQA</th>
227
+ <th>DocVQA</th>
228
+ <th>MathVision</th>
229
+ <th>DynaMath</th>
230
+ <th>WeMath</th>
231
+ <th colspan="2">Obj Hal</th>
232
+ <th colspan="2">MM Hal</th>
233
+ </tr>
234
+ </thead>
235
+ <tbody>
236
+ <tr>
237
+ <td></td>
238
+ <td></td>
239
+ <td></td>
240
+ <td></td>
241
+ <td></td>
242
+ <td></td>
243
+ <td></td>
244
+ <td></td>
245
+ <td></td>
246
+ <td></td>
247
+ <td>CHAIRs↓</td>
248
+ <td>CHAIRi↓</td>
249
+ <td nowrap="nowrap">score avg@3↑</td>
250
+ <td nowrap="nowrap">hall rate avg@3↓</td>
251
+ </tr>
252
+ <tbody align="center">
253
+ <tr>
254
+ <td colspan="14" align="left"><strong>Proprietary</strong></td>
255
+ </tr>
256
+ <tr>
257
+ <td nowrap="nowrap" align="left">GPT-4v-20240409</td>
258
+ <td>-</td>
259
+ <td>78.5</td>
260
+ <td>1927</td>
261
+ <td>61.4</td>
262
+ <td>78.0</td>
263
+ <td>88.4</td>
264
+ <td>-</td>
265
+ <td>-</td>
266
+ <td>-</td>
267
+ <td>-</td>
268
+ <td>-</td>
269
+ <td>-</td>
270
+ <td>-</td>
271
+ </tr>
272
+ <tr>
273
+ <td nowrap="nowrap" align="left">Gemini-1.5-Pro</td>
274
+ <td>-</td>
275
+ <td>87.2</td>
276
+ <td>-</td>
277
+ <td>67.5</td>
278
+ <td>78.8</td>
279
+ <td>93.1</td>
280
+ <td>41.0</td>
281
+ <td>31.5</td>
282
+ <td>50.5</td>
283
+ <td>-</td>
284
+ <td>-</td>
285
+ <td>-</td>
286
+ <td>-</td>
287
+ </tr>
288
+ <tr>
289
+ <td nowrap="nowrap" align="left">GPT-4.1-mini-20250414</td>
290
+ <td>-</td>
291
+ <td>-</td>
292
+ <td>-</td>
293
+ <td>-</td>
294
+ <td>-</td>
295
+ <td>-</td>
296
+ <td>45.3</td>
297
+ <td>47.7</td>
298
+ <td>-</td>
299
+ <td>-</td>
300
+ <td>-</td>
301
+ <td>-</td>
302
+ <td>-</td>
303
+ </tr>
304
+ <tr>
305
+ <td nowrap="nowrap" align="left">Claude 3.5 Sonnet-20241022</td>
306
+ <td>-</td>
307
+ <td>90.8</td>
308
+ <td>-</td>
309
+ <td>60.1</td>
310
+ <td>74.1</td>
311
+ <td>95.2</td>
312
+ <td>35.6</td>
313
+ <td>35.7</td>
314
+ <td>44.0</td>
315
+ <td>-</td>
316
+ <td>-</td>
317
+ <td>-</td>
318
+ <td>-</td>
319
+ </tr>
320
+ <tr>
321
+ <td colspan="14" align="left"><strong>Open-source</strong></td>
322
+ </tr>
323
+ <tr>
324
+ <td nowrap="nowrap" align="left">Qwen2.5-VL-3B-Instruct</td>
325
+ <td>3.8B</td>
326
+ <td>84.0</td>
327
+ <td>2157</td>
328
+ <td>65.4</td>
329
+ <td>79.3</td>
330
+ <td>93.9</td>
331
+ <td>21.9</td>
332
+ <td>13.2</td>
333
+ <td>22.9</td>
334
+ <td>18.3</td>
335
+ <td>10.8</td>
336
+ <td>3.9 </td>
337
+ <td>33.3 </td>
338
+ </tr>
339
+ <tr>
340
+ <td nowrap="nowrap" align="left">InternVL2.5-4B</td>
341
+ <td>3.7B</td>
342
+ <td>84.0</td>
343
+ <td>2338</td>
344
+ <td>64.3</td>
345
+ <td>76.8</td>
346
+ <td>91.6</td>
347
+ <td>18.4</td>
348
+ <td>15.2</td>
349
+ <td>21.2</td>
350
+ <td>13.7</td>
351
+ <td>8.7</td>
352
+ <td>3.2 </td>
353
+ <td>46.5 </td>
354
+ </tr>
355
+ <tr>
356
+ <td nowrap="nowrap" align="left">Qwen2.5-VL-7B-Instruct</td>
357
+ <td>8.3B</td>
358
+ <td>87.3</td>
359
+ <td>2347</td>
360
+ <td>68.5</td>
361
+ <td>84.9</td>
362
+ <td>95.7</td>
363
+ <td>25.4</td>
364
+ <td>21.8</td>
365
+ <td>36.2</td>
366
+ <td>13.3</td>
367
+ <td>7.9</td>
368
+ <td>4.1 </td>
369
+ <td>31.6 </td>
370
+ </tr>
371
+ <tr>
372
+ <td nowrap="nowrap" align="left">InternVL2.5-8B</td>
373
+ <td>8.1B</td>
374
+ <td>84.8</td>
375
+ <td>2344</td>
376
+ <td>70.1</td>
377
+ <td>79.1</td>
378
+ <td>93.0</td>
379
+ <td>17.0</td>
380
+ <td>9.4</td>
381
+ <td>23.5</td>
382
+ <td>18.3</td>
383
+ <td>11.6</td>
384
+ <td>3.6 </td>
385
+ <td>37.2</td>
386
+ </tr>
387
+ <tr>
388
+ <td nowrap="nowrap" align="left">MiniCPM-V-2.6</td>
389
+ <td>8.1B</td>
390
+ <td>79.4</td>
391
+ <td>2348</td>
392
+ <td>65.0</td>
393
+ <td>80.1</td>
394
+ <td>90.8</td>
395
+ <td>17.5</td>
396
+ <td>9.0</td>
397
+ <td>20.4</td>
398
+ <td>7.3</td>
399
+ <td>4.7</td>
400
+ <td>4.0 </td>
401
+ <td>29.9 </td>
402
+ </tr>
403
+ <tr>
404
+ <td nowrap="nowrap" align="left">MiniCPM-o-2.6</td>
405
+ <td>8.7B</td>
406
+ <td>86.9</td>
407
+ <td>2372</td>
408
+ <td>68.1</td>
409
+ <td>82.0</td>
410
+ <td>93.5</td>
411
+ <td>21.7</td>
412
+ <td>10.4</td>
413
+ <td>25.2</td>
414
+ <td>6.3</td>
415
+ <td>3.4</td>
416
+ <td>4.1 </td>
417
+ <td>31.3 </td>
418
+ </tr>
419
+ <tr>
420
+ <td nowrap="nowrap" align="left">MiniCPM-V-4.0</td>
421
+ <td>4.1B</td>
422
+ <td>84.4</td>
423
+ <td>2298</td>
424
+ <td>68.5</td>
425
+ <td>80.8</td>
426
+ <td>92.9</td>
427
+ <td>20.7</td>
428
+ <td>14.2</td>
429
+ <td>32.7</td>
430
+ <td>6.3</td>
431
+ <td>3.5</td>
432
+ <td>4.1 </td>
433
+ <td>29.2 </td>
434
+ </tr>
435
+ </tbody>
436
+ </table>
437
+ </div>
438
+
439
+ </details>
440
+
441
+ <details>
442
+ <summary>Click to view multi-image and video understanding results on Mantis, Blink and Video-MME. </summary>
443
+ <div align="center">
444
+ <table style="margin: 0px auto;">
445
+ <thead>
446
+ <tr>
447
+ <th nowrap="nowrap" align="left">model</th>
448
+ <th>Size</th>
449
+ <th>Mantis</th>
450
+ <th>Blink</th>
451
+ <th nowrap="nowrap" colspan="2" >Video-MME</th>
452
+ </tr>
453
+ </thead>
454
+ <tbody>
455
+ <tr>
456
+ <td></td>
457
+ <td></td>
458
+ <td></td>
459
+ <td></td>
460
+ <td>wo subs</td>
461
+ <td>w subs</td>
462
+ </tr>
463
+ <tbody align="center">
464
+ <tr>
465
+ <td colspan="6" align="left"><strong>Proprietary</strong></td>
466
+ </tr>
467
+ <tr>
468
+ <td nowrap="nowrap" align="left">GPT-4v-20240409</td>
469
+ <td>-</td>
470
+ <td>62.7</td>
471
+ <td>54.6</td>
472
+ <td>59.9</td>
473
+ <td>63.3</td>
474
+ </tr>
475
+ <tr>
476
+ <td nowrap="nowrap" align="left">Gemini-1.5-Pro</td>
477
+ <td>-</td>
478
+ <td>-</td>
479
+ <td>59.1</td>
480
+ <td>75.0</td>
481
+ <td>81.3</td>
482
+ </tr>
483
+ <tr>
484
+ <td nowrap="nowrap" align="left">GPT-4o-20240513</td>
485
+ <td>-</td>
486
+ <td>-</td>
487
+ <td>68.0</td>
488
+ <td>71.9</td>
489
+ <td>77.2</td>
490
+ </tr>
491
+ <tr>
492
+ <td colspan="6" align="left"><strong>Open-source</strong></td>
493
+ </tr>
494
+ <tr>
495
+ <td nowrap="nowrap" align="left">Qwen2.5-VL-3B-Instruct</td>
496
+ <td>3.8B</td>
497
+ <td>-</td>
498
+ <td>47.6</td>
499
+ <td>61.5</td>
500
+ <td>67.6</td>
501
+ </tr>
502
+ <tr>
503
+ <td nowrap="nowrap" align="left">InternVL2.5-4B</td>
504
+ <td>3.7B</td>
505
+ <td>62.7</td>
506
+ <td>50.8</td>
507
+ <td>62.3</td>
508
+ <td>63.6</td>
509
+ </tr>
510
+ <tr>
511
+ <td nowrap="nowrap" align="left">Qwen2.5-VL-7B-Instruct</td>
512
+ <td>8.3B</td>
513
+ <td>-</td>
514
+ <td>56.4</td>
515
+ <td>65.1</td>
516
+ <td>71.6</td>
517
+ </tr>
518
+ <tr>
519
+ <td nowrap="nowrap" align="left">InternVL2.5-8B</td>
520
+ <td>8.1B</td>
521
+ <td>67.7</td>
522
+ <td>54.8</td>
523
+ <td>64.2</td>
524
+ <td>66.9</td>
525
+ </tr>
526
+ <tr>
527
+ <td nowrap="nowrap" align="left">MiniCPM-V-2.6</td>
528
+ <td>8.1B</td>
529
+ <td>69.1</td>
530
+ <td>53.0</td>
531
+ <td>60.9</td>
532
+ <td>63.6</td>
533
+ </tr>
534
+ <tr>
535
+ <td nowrap="nowrap" align="left">MiniCPM-o-2.6</td>
536
+ <td>8.7B</td>
537
+ <td>71.9</td>
538
+ <td>56.7</td>
539
+ <td>63.9</td>
540
+ <td>69.6</td>
541
+ </tr>
542
+ <tr>
543
+ <td nowrap="nowrap" align="left">MiniCPM-V-4.0</td>
544
+ <td>4.1B</td>
545
+ <td>71.4</td>
546
+ <td>54.0</td>
547
+ <td>61.2</td>
548
+ <td>65.8</td>
549
+ </tr>
550
+ </tbody>
551
+ </table>
552
+ </div>
553
+
554
+ </details>
555
+
556
+ ### Examples
557
+
558
+ <div style="display: flex; flex-direction: column; align-items: center;">
559
+ <img src="https://raw.githubusercontent.com/openbmb/MiniCPM-o/main/assets/minicpmv4/minicpm-v-4-case.png" alt="math" style="margin-bottom: 5px;">
560
+ </div>
561
+
562
+ Run locally on iPhone 16 Pro Max with [iOS demo](https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/demo/ios_demo/ios.md).
563
+
564
+ <div align="center">
565
+ <img src="https://raw.githubusercontent.com/openbmb/MiniCPM-o/main/assets/minicpmv4/iphone_en.gif" width="45%" style="display: inline-block; margin: 0 10px;"/>
566
+ <img src="https://raw.githubusercontent.com/openbmb/MiniCPM-o/main/assets/minicpmv4/iphone_en_information_extraction.gif" width="45%" style="display: inline-block; margin: 0 10px;"/>
567
+ </div>
568
+
569
+ <div align="center">
570
+ <img src="https://raw.githubusercontent.com/openbmb/MiniCPM-o/main/assets/minicpmv4/iphone_cn.gif" width="45%" style="display: inline-block; margin: 0 10px;"/>
571
+ <img src="https://raw.githubusercontent.com/openbmb/MiniCPM-o/main/assets/minicpmv4/iphone_cn_funny_points.gif" width="45%" style="display: inline-block; margin: 0 10px;"/>
572
+ </div>
573
+
574
+ ## Usage
575
+
576
+ ```python
577
+ from PIL import Image
578
+ import torch
579
+ from transformers import AutoModel, AutoTokenizer
580
+
581
+ model_path = 'openbmb/MiniCPM-V-4'
582
+ model = AutoModel.from_pretrained(model_path, trust_remote_code=True,
583
+ # sdpa or flash_attention_2, no eager
584
+ attn_implementation='sdpa', torch_dtype=torch.bfloat16)
585
+ model = model.eval().cuda()
586
+ tokenizer = AutoTokenizer.from_pretrained(
587
+ model_path, trust_remote_code=True)
588
+
589
+
590
+
591
+ image = Image.open('./assets/single.png').convert('RGB')
592
+ display(image.resize((400, 400)))
593
+
594
+ # First round chat
595
+ question = "What is the landform in the picture?"
596
+ msgs = [{'role': 'user', 'content': [image, question]}]
597
+
598
+ answer = model.chat(
599
+ msgs=msgs,
600
+ image=image,
601
+ tokenizer=tokenizer
602
+ )
603
+ print(answer)
604
+
605
+
606
+ # Second round chat, pass history context of multi-turn conversation
607
+ msgs.append({"role": "assistant", "content": [answer]})
608
+ msgs.append({"role": "user", "content": [
609
+ "What should I pay attention to when traveling here?"]})
610
+
611
+ answer = model.chat(
612
+ msgs=msgs,
613
+ image=None,
614
+ tokenizer=tokenizer
615
+ )
616
+ print(answer)
617
+ ```
618
+
619
+
620
+ ## License
621
+ #### Model License
622
+ * The code in this repo is released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License.
623
+ * The usage of MiniCPM-V series model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md).
624
+ * The models and weights of MiniCPM are completely free for academic research. After filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, MiniCPM-V 2.6 weights are also available for free commercial use.
625
+
626
+
627
+ #### Statement
628
+ * As an LMM, MiniCPM-V 4.0 generates contents by learning a large mount of multimodal corpora, but it cannot comprehend, express personal opinions or make value judgement. Anything generated by MiniCPM-V 4.0 does not represent the views and positions of the model developers
629
+ * We will not be liable for any problems arising from the use of the MinCPM-V models, including but not limited to data security issues, risk of public opinion, or any risks and problems arising from the misdirection, misuse, dissemination or misuse of the model.
630
+
631
+ ## Key Techniques and Other Multimodal Projects
632
+
633
+ 👏 Welcome to explore key techniques of MiniCPM-V 2.6 and other multimodal projects of our team:
634
+
635
+ [VisCPM](https://github.com/OpenBMB/VisCPM/tree/main) | [RLHF-V](https://github.com/RLHF-V/RLHF-V) | [LLaVA-UHD](https://github.com/thunlp/LLaVA-UHD) | [RLAIF-V](https://github.com/RLHF-V/RLAIF-V)
636
+
637
+ ## Citation
638
+
639
+ If you find our work helpful, please consider citing our papers 📝 and liking this project ❤️!
640
+
641
+ ```bib
642
+ @article{yao2024minicpm,
643
+ title={MiniCPM-V: A GPT-4V Level MLLM on Your Phone},
644
+ author={Yao, Yuan and Yu, Tianyu and Zhang, Ao and Wang, Chongyi and Cui, Junbo and Zhu, Hongji and Cai, Tianchi and Li, Haoyu and Zhao, Weilin and He, Zhihui and others},
645
+ journal={Nat Commun 16, 5509 (2025)},
646
+ year={2025}
647
+ }
648
+ ```
649
+
ane_minicpmv4_vit_f16.mlmodelc.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7751a5f889396c72622d90113f80b0ce0abd8efeeb63011388047a77a1fc1482
3
+ size 635559301
ane_minicpmv4_vit_f16.mlmodelc/analytics/coremldata.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b5bb3712cb631ef6c4a430b02994c3b12eeb007b64c486a4b1e752fea26652c4
3
+ size 243
ane_minicpmv4_vit_f16.mlmodelc/coremldata.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9892bfc4a858d381faa4176acce8733b89eb4f0260b90432184e24be7d44e9c2
3
+ size 713
ane_minicpmv4_vit_f16.mlmodelc/metadata.json ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "shortDescription" : "MiniCPM-V 4.0 vit on ANE",
4
+ "metadataOutputVersion" : "3.0",
5
+ "outputSchema" : [
6
+ {
7
+ "hasShapeFlexibility" : "0",
8
+ "isOptional" : "0",
9
+ "dataType" : "Float32",
10
+ "formattedType" : "MultiArray (Float32 1 × 1024 × 1152)",
11
+ "shortDescription" : "",
12
+ "shape" : "[1, 1024, 1152]",
13
+ "name" : "output",
14
+ "type" : "MultiArray"
15
+ }
16
+ ],
17
+ "version" : "4.0.0",
18
+ "modelParameters" : [
19
+
20
+ ],
21
+ "author" : "tianchi",
22
+ "specificationVersion" : 6,
23
+ "storagePrecision" : "Float16",
24
+ "license" : "Apache 2.0",
25
+ "mlProgramOperationTypeHistogram" : {
26
+ "Linear" : 162,
27
+ "Matmul" : 54,
28
+ "Cast" : 2,
29
+ "Softmax" : 27,
30
+ "Mul" : 27,
31
+ "Transpose" : 108,
32
+ "LayerNorm" : 55,
33
+ "Add" : 54,
34
+ "Reshape" : 108,
35
+ "Gelu" : 27
36
+ },
37
+ "computePrecision" : "Mixed (Float16, Float32, Int32)",
38
+ "stateSchema" : [
39
+
40
+ ],
41
+ "isUpdatable" : "0",
42
+ "availability" : {
43
+ "macOS" : "12.0",
44
+ "tvOS" : "15.0",
45
+ "visionOS" : "1.0",
46
+ "watchOS" : "8.0",
47
+ "iOS" : "15.0",
48
+ "macCatalyst" : "15.0"
49
+ },
50
+ "modelType" : {
51
+ "name" : "MLModelType_mlProgram"
52
+ },
53
+ "inputSchema" : [
54
+ {
55
+ "hasShapeFlexibility" : "0",
56
+ "isOptional" : "0",
57
+ "dataType" : "Float32",
58
+ "formattedType" : "MultiArray (Float32 1 × 1024 × 1152)",
59
+ "shortDescription" : "",
60
+ "shape" : "[1, 1024, 1152]",
61
+ "name" : "input",
62
+ "type" : "MultiArray"
63
+ }
64
+ ],
65
+ "userDefinedMetadata" : {
66
+ "converter" : "coremltools",
67
+ "compute_units" : "ALL",
68
+ "deployment_target" : "iOS15+",
69
+ "model_id" : "61a76759-ea68-463d-bbdf-bb8ded301a81",
70
+ "com.github.apple.coremltools.version" : "8.3.0",
71
+ "base_model" : "MiniCPM-V4",
72
+ "owner" : "tianchi",
73
+ "input_shape" : "torch.Size([1, 1024, 1152])",
74
+ "batch_size" : "1",
75
+ "precision" : "float16",
76
+ "target_device" : "ANE",
77
+ "com.github.apple.coremltools.source" : "torch==2.6.0",
78
+ "com.github.apple.coremltools.source_dialect" : "TorchScript",
79
+ "framework" : "pytorch",
80
+ "model_type" : "vision_transformer"
81
+ },
82
+ "generatedClassName" : "ane_minicpmv4_vit_f16",
83
+ "method" : "predict"
84
+ }
85
+ ]
ane_minicpmv4_vit_f16.mlmodelc/model.mil ADDED
The diff for this file is too large to render. See raw diff
 
ane_minicpmv4_vit_f16.mlmodelc/weights/weight.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3cd718dae817f3582db314ab114d4fab3cdc3e8e066bd4b5041113b2ca8a16ad
3
+ size 822966528
ggml-model-Q4_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f3eab8d69253a22ae2d5b06e2732be10c792b20b07141a51e30768bb03cc6ccc
3
+ size 2079023456
ggml-model-Q4_1.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b97c6ad1f690c9528835ee0faa87dc1dbc40a6fb707d82b2cce2dd616a7d9cc3
3
+ size 2292626016
ggml-model-Q4_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b0ff610e9c92b30389ff1e0dd40fffed3c1f02a9d34a735fd5fba6a5ad25672b
3
+ size 2189861216
ggml-model-Q4_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5b02ff32d3537207c7a160f8dcf2aa792c40fe1be2fed7ef9001aae4e802058f
3
+ size 2092458336
ggml-model-Q5_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6da0b86200388b15ac8c33940abfb961f7e51d7107a0776ed6ddaa819e4ae29a
3
+ size 2506228576
ggml-model-Q5_1.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d67ed4cc239c82736f5beb185e84f449dbf1044658668aaf4281bcec5e0a4253
3
+ size 2719831136
ggml-model-Q5_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cd643b2cc310df9b6b2ce3a6bdf18eb16a59485bd4c6bb962ff700ad6a153b03
3
+ size 2563326816
ggml-model-Q5_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2643b82737a05b5149f440e276c281918a579f7ecbf06a48a7ae5ef7f96162ec
3
+ size 2506228576
ggml-model-Q6_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc78a557d5bcc3cb21f99667f97fe3e0381fc129d586d1532fffe1926ec53bd1
3
+ size 2960134016
ggml-model-Q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4618297c8a2ec285c84dd219d6daaaecd4359a8c92a1fc9bb0d629928be44bad
3
+ size 3833381696
mmproj-model-f16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f0faa9ae63532300999c86a196f140c716cd0fbb08bbbd81850f1f9a631f7761
3
+ size 958777792