MahmoodLab/UNI2-h · Inquiry on Compatibility of DINOv2 Self-Supervised Training with UNI2's ViT-H Architecture

Dear UNI2 authors,

Since DINOv2’s official implementation doesn’t support ViT-H (681M params, patch 14, 1536 dim), is it possible to modify their GitHub code to work with UNI2’s architecture? Specifically:

Did you adjust DINOv2’s code (e.g., projection heads, loss functions) for UNI2’s custom ViT-H?
Are there known issues when applying DINOv2’s self-supervised pipeline (multi-crop, momentum teacher) to UNI2’s weights?

A brief clarification would be very helpful, thank you very much!

Best,
LI