On Data Scaling in Masked Image Modeling
Paper
•
2206.04664
•
Published
This repository is primarily used for storing SimMIM pretrained Swin-V2 models, which are utilized in the "On Data Scaling in Masked Image Modeling" study. If you have any questions about SimMIM or the Data Scaling study, please file an issue in this repository or contact [email protected] directly. Please note that the SimMIM and Swin-Transformer repositories managed by Microsoft are no longer within my scope.
You can use the direct link below to download the checkpoints, or use the huggingface_hub library to download checkpoints using Python.
| name | model size | pre-train dataset | pre-train iterations | validation loss | fine-tuned acc@1 | pre-trained model | fine-tuned model |
|---|---|---|---|---|---|---|---|
| SwinV2-Small | 49M | ImageNet-1K 10% | 125k | 0.4820 | 82.69 | huggingface | huggingface |
| SwinV2-Small | 49M | ImageNet-1K 10% | 250k | 0.4961 | 83.11 | huggingface | huggingface |
| SwinV2-Small | 49M | ImageNet-1K 10% | 500k | 0.5115 | 83.17 | huggingface | huggingface |
| SwinV2-Small | 49M | ImageNet-1K 20% | 125k | 0.4751 | 83.05 | huggingface | huggingface |
| SwinV2-Small | 49M | ImageNet-1K 20% | 250k | 0.4722 | 83.56 | huggingface | huggingface |
| SwinV2-Small | 49M | ImageNet-1K 20% | 500k | 0.4734 | 83.75 | huggingface | huggingface |
| SwinV2-Small | 49M | ImageNet-1K 50% | 125k | 0.4732 | 83.04 | huggingface | huggingface |
| SwinV2-Small | 49M | ImageNet-1K 50% | 250k | 0.4681 | 83.67 | huggingface | huggingface |
| SwinV2-Small | 49M | ImageNet-1K 50% | 500k | 0.4646 | 83.96 | huggingface | huggingface |
| SwinV2-Small | 49M | ImageNet-1K | 125k | 0.4728 | 82.92 | huggingface | huggingface |
| SwinV2-Small | 49M | ImageNet-1K | 250k | 0.4674 | 83.66 | huggingface | huggingface |
| SwinV2-Small | 49M | ImageNet-1K | 500k | 0.4641 | 84.08 | huggingface | huggingface |
| SwinV2-Base | 87M | ImageNet-1K 10% | 125k | 0.4822 | 83.33 | huggingface | huggingface |
| SwinV2-Base | 87M | ImageNet-1K 10% | 250k | 0.4997 | 83.60 | huggingface | huggingface |
| SwinV2-Base | 87M | ImageNet-1K 10% | 500k | 0.5112 | 83.41 | huggingface | huggingface |
| SwinV2-Base | 87M | ImageNet-1K 20% | 125k | 0.4703 | 83.86 | huggingface | huggingface |
| SwinV2-Base | 87M | ImageNet-1K 20% | 250k | 0.4679 | 84.37 | huggingface | huggingface |
| SwinV2-Base | 87M | ImageNet-1K 20% | 500k | 0.4711 | 84.61 | huggingface | huggingface |
| SwinV2-Base | 87M | ImageNet-1K 50% | 125k | 0.4683 | 84.04 | huggingface | huggingface |
| SwinV2-Base | 87M | ImageNet-1K 50% | 250k | 0.4633 | 84.57 | huggingface | huggingface |
| SwinV2-Base | 87M | ImageNet-1K 50% | 500k | 0.4598 | 84.95 | huggingface | huggingface |
| SwinV2-Base | 87M | ImageNet-1K | 125k | 0.4680 | 84.13 | huggingface | huggingface |
| SwinV2-Base | 87M | ImageNet-1K | 250k | 0.4626 | 84.65 | huggingface | huggingface |
| SwinV2-Base | 87M | ImageNet-1K | 500k | 0.4588 | 85.04 | huggingface | huggingface |
| SwinV2-Base | 87M | ImageNet-22K | 125k | 0.4695 | 84.11 | huggingface | huggingface |
| SwinV2-Base | 87M | ImageNet-22K | 250k | 0.4649 | 84.57 | huggingface | huggingface |
| SwinV2-Base | 87M | ImageNet-22K | 500k | 0.4614 | 85.11 | huggingface | huggingface |
| SwinV2-Large | 195M | ImageNet-1K 10% | 125k | 0.4995 | 83.69 | huggingface | huggingface |
| SwinV2-Large | 195M | ImageNet-1K 10% | 250k | 0.5140 | 83.66 | huggingface | huggingface |
| SwinV2-Large | 195M | ImageNet-1K 10% | 500k | 0.5150 | 83.50 | huggingface | huggingface |
| SwinV2-Large | 195M | ImageNet-1K 20% | 125k | 0.4675 | 84.38 | huggingface | huggingface |
| SwinV2-Large | 195M | ImageNet-1K 20% | 250k | 0.4746 | 84.71 | huggingface | huggingface |
| SwinV2-Large | 195M | ImageNet-1K 20% | 500k | 0.4960 | 84.59 | huggingface | huggingface |
| SwinV2-Large | 195M | ImageNet-1K 50% | 125k | 0.4622 | 84.78 | huggingface | huggingface |
| SwinV2-Large | 195M | ImageNet-1K 50% | 250k | 0.4566 | 85.38 | huggingface | huggingface |
| SwinV2-Large | 195M | ImageNet-1K 50% | 500k | 0.4530 | 85.80 | huggingface | huggingface |
| SwinV2-Large | 195M | ImageNet-1K | 125k | 0.4611 | 84.98 | huggingface | huggingface |
| SwinV2-Large | 195M | ImageNet-1K | 250k | 0.4552 | 85.45 | huggingface | huggingface |
| SwinV2-Large | 195M | ImageNet-1K | 500k | 0.4507 | 85.91 | huggingface | huggingface |
| SwinV2-Large | 195M | ImageNet-22K | 125k | 0.4649 | 84.61 | huggingface | huggingface |
| SwinV2-Large | 195M | ImageNet-22K | 250k | 0.4586 | 85.39 | huggingface | huggingface |
| SwinV2-Large | 195M | ImageNet-22K | 500k | 0.4536 | 85.81 | huggingface | huggingface |
| SwinV2-Huge | 655M | ImageNet-1K 20% | 125k | 0.4789 | 84.35 | huggingface | huggingface |
| SwinV2-Huge | 655M | ImageNet-1K 20% | 250k | 0.5038 | 84.16 | huggingface | huggingface |
| SwinV2-Huge | 655M | ImageNet-1K 20% | 500k | 0.5071 | 83.44 | huggingface | huggingface |
| SwinV2-Huge | 655M | ImageNet-1K 50% | 125k | 0.4549 | 85.09 | huggingface | huggingface |
| SwinV2-Huge | 655M | ImageNet-1K 50% | 250k | 0.4511 | 85.64 | huggingface | huggingface |
| SwinV2-Huge | 655M | ImageNet-1K 50% | 500k | 0.4559 | 85.69 | huggingface | huggingface |
| SwinV2-Huge | 655M | ImageNet-1K | 125k | 0.4531 | 85.23 | huggingface | huggingface |
| SwinV2-Huge | 655M | ImageNet-1K | 250k | 0.4464 | 85.90 | huggingface | huggingface |
| SwinV2-Huge | 655M | ImageNet-1K | 500k | 0.4416 | 86.34 | huggingface | huggingface |
| SwinV2-Huge | 655M | ImageNet-22K | 125k | 0.4564 | 85.14 | huggingface | huggingface |
| SwinV2-Huge | 655M | ImageNet-22K | 250k | 0.4499 | 85.86 | huggingface | huggingface |
| SwinV2-Huge | 655M | ImageNet-22K | 500k | 0.4444 | 86.27 | huggingface | huggingface |
| SwinV2-giant | 1.06B | ImageNet-1K 50% | 125k | 0.4534 | 85.44 | huggingface | huggingface |
| SwinV2-giant | 1.06B | ImageNet-1K 50% | 250k | 0.4515 | 85.76 | huggingface | huggingface |
| SwinV2-giant | 1.06B | ImageNet-1K 50% | 500k | 0.4719 | 85.51 | huggingface | huggingface |
| SwinV2-giant | 1.06B | ImageNet-1K | 125k | 0.4513 | 85.57 | huggingface | huggingface |
| SwinV2-giant | 1.06B | ImageNet-1K | 250k | 0.4442 | 86.12 | huggingface | huggingface |
| SwinV2-giant | 1.06B | ImageNet-1K | 500k | 0.4395 | 86.46 | huggingface | huggingface |
| SwinV2-giant | 1.06B | ImageNet-22K | 125k | 0.4544 | 85.39 | huggingface | huggingface |
| SwinV2-giant | 1.06B | ImageNet-22K | 250k | 0.4475 | 85.96 | huggingface | huggingface |
| SwinV2-giant | 1.06B | ImageNet-22K | 500k | 0.4416 | 86.53 | huggingface | huggingface |
@inproceedings{xie2021simmim,
title={SimMIM: A Simple Framework for Masked Image Modeling},
author={Xie, Zhenda and Zhang, Zheng and Cao, Yue and Lin, Yutong and Bao, Jianmin and Yao, Zhuliang and Dai, Qi and Hu, Han},
booktitle={International Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2022}
}
@article{xie2022data,
title={On Data Scaling in Masked Image Modeling},
author={Xie, Zhenda and Zhang, Zheng and Cao, Yue and Lin, Yutong and Wei, Yixuan and Dai, Qi and Hu, Han},
journal={arXiv preprint arXiv:2206.04664},
year={2022}
}
@inproceedings{liu2021swinv2,
title={Swin Transformer V2: Scaling Up Capacity and Resolution},
author={Ze Liu and Han Hu and Yutong Lin and Zhuliang Yao and Zhenda Xie and Yixuan Wei and Jia Ning and Yue Cao and Zheng Zhang and Li Dong and Furu Wei and Baining Guo},
booktitle={International Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2022}
}
Totally Free + Zero Barriers + No Login Required