lbourdois commited on
Commit
2894c1a
·
verified ·
1 Parent(s): e9374e1

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +70 -56
README.md CHANGED
@@ -1,57 +1,71 @@
1
- ---
2
- license: apache-2.0
3
- datasets:
4
- - weizhiwang/mlm_filter_instructions
5
- - Lin-Chen/ShareGPT4V
6
- base_model:
7
- - Qwen/Qwen2.5-1.5B-Instruct
8
- - google/siglip-so400m-patch14-384
9
- pipeline_tag: text-generation
10
- ---
11
-
12
-
13
- # MLM-Filter-Qwen2.5-1.5B-GPT4o Model Card
14
-
15
- ## Model details
16
-
17
- **Model type:**
18
- MLM-Filter-Qwen2.5-1.5B-GPT4o is an open-source MLLM trained to assess the data quality of image-text paired data. It can generate 4 quality metrics for image-text data: Image Text Matching, Object Detail Fulfillment, Caption Text Quality, and Semantic Understanding.
19
-
20
- **Model date:**
21
- MLM-Filter-Qwen2.5-1.5B-GPT4o was trained in Dec 2024.
22
-
23
- **Paper or resources for more information:**
24
- https://mlm-filter.github.io/
25
-
26
- ```
27
- @article{wang2024finetuned,
28
- title={Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters},
29
- author={Wang, Weizhi and Mrini, Khalil and Yang, Linjie and Kumar, Sateesh and Tian, Yu and Yan, Xifeng and Wang, Heng},
30
- journal={arXiv preprint arXiv:2403.02677},
31
- year={2024}
32
- }
33
- ```
34
-
35
- ## License
36
- Apache-2.0
37
-
38
- **Where to send questions or comments about the model:**
39
- https://github.com/Victorwz/MLM_Filter/issues
40
-
41
- ## Intended use
42
- **Primary intended uses:**
43
- MLM-Filter can be used as a drop-in replacement for CLIPScore in these tasks:
44
-
45
- 1. Score image-text data in large-scale pre-training dataset and then filter high-quality subsets based on the scores (For training MLLMs or VLMs, please consider to jointly use the Image-Text Matching score and the Object Detail Fulfillment score);
46
-
47
- 2. Evaluate the image-text alignment for image2text or text2image generation models;
48
-
49
- 3. Any potential applications with the need to calculate the image-text alignment.
50
-
51
-
52
- ## Training dataset (709K)
53
- - 665k ShareGPT4V data.
54
- - 44k instructions on image-text data quality assessment tasks ranging across 4 metrics.
55
-
56
- ## Usage Sample
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
  Please follow the instructions in https://github.com/Victorwz/MLM_Filter.
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - weizhiwang/mlm_filter_instructions
5
+ - Lin-Chen/ShareGPT4V
6
+ base_model:
7
+ - Qwen/Qwen2.5-1.5B-Instruct
8
+ - google/siglip-so400m-patch14-384
9
+ pipeline_tag: text-generation
10
+ language:
11
+ - zho
12
+ - eng
13
+ - fra
14
+ - spa
15
+ - por
16
+ - deu
17
+ - ita
18
+ - rus
19
+ - jpn
20
+ - kor
21
+ - vie
22
+ - tha
23
+ - ara
24
+ ---
25
+
26
+
27
+ # MLM-Filter-Qwen2.5-1.5B-GPT4o Model Card
28
+
29
+ ## Model details
30
+
31
+ **Model type:**
32
+ MLM-Filter-Qwen2.5-1.5B-GPT4o is an open-source MLLM trained to assess the data quality of image-text paired data. It can generate 4 quality metrics for image-text data: Image Text Matching, Object Detail Fulfillment, Caption Text Quality, and Semantic Understanding.
33
+
34
+ **Model date:**
35
+ MLM-Filter-Qwen2.5-1.5B-GPT4o was trained in Dec 2024.
36
+
37
+ **Paper or resources for more information:**
38
+ https://mlm-filter.github.io/
39
+
40
+ ```
41
+ @article{wang2024finetuned,
42
+ title={Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters},
43
+ author={Wang, Weizhi and Mrini, Khalil and Yang, Linjie and Kumar, Sateesh and Tian, Yu and Yan, Xifeng and Wang, Heng},
44
+ journal={arXiv preprint arXiv:2403.02677},
45
+ year={2024}
46
+ }
47
+ ```
48
+
49
+ ## License
50
+ Apache-2.0
51
+
52
+ **Where to send questions or comments about the model:**
53
+ https://github.com/Victorwz/MLM_Filter/issues
54
+
55
+ ## Intended use
56
+ **Primary intended uses:**
57
+ MLM-Filter can be used as a drop-in replacement for CLIPScore in these tasks:
58
+
59
+ 1. Score image-text data in large-scale pre-training dataset and then filter high-quality subsets based on the scores (For training MLLMs or VLMs, please consider to jointly use the Image-Text Matching score and the Object Detail Fulfillment score);
60
+
61
+ 2. Evaluate the image-text alignment for image2text or text2image generation models;
62
+
63
+ 3. Any potential applications with the need to calculate the image-text alignment.
64
+
65
+
66
+ ## Training dataset (709K)
67
+ - 665k ShareGPT4V data.
68
+ - 44k instructions on image-text data quality assessment tasks ranging across 4 metrics.
69
+
70
+ ## Usage Sample
71
  Please follow the instructions in https://github.com/Victorwz/MLM_Filter.