Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
xiaotianhan 
posted an update Mar 28, 2024
Post
2129
🎉 🎉 🎉 Happy to share our recent work. We noticed that image resolution plays an important role, either in improving multi-modal large language models (MLLM) performance or in Sora style any resolution encoder decoder, we hope this work can help lift restriction of 224x224 resolution limit in ViT.

ViTAR: Vision Transformer with Any Resolution (2403.18361)

Hiya, are you planning to open-source the models?

·

Thanks for your interest, yeah, we will open source our code and pretrained weights soon.

In this post