Towards Cross-View Point Correspondence in Vision-Language Models
Paper
•
2512.04686
•
Published
CroPond-3B is a vision-language model specialized in cross-view point correspondence. Built upon Qwen2.5-VL-3B-Instruct and mainly trained on the CrossPoint-378K dataset, CroPond achieves state-of-the-art performance on cross-view correspondence tasks.
For detailed evaluation instructions, please visit the GitHub repository.
@article{wang2025crosspoint,
title={Towards Cross-View Point Correspondence in Vision-Language Models},
author={Wang, Yipu and Ji, Yuheng and Liu, Yuyang and Zhou, Enshen and Yang, Ziqiang and Tian, Yuxuan and Qin, Ziheng and Liu, Yue and Tan, Huajie and Chi, Cheng and Ma, Zhiyuan and Zeng, Daniel Dajun and Zheng, Xiaolong},
journal={arXiv preprint arXiv:2512.04686},
year={2025}
}
Totally Free + Zero Barriers + No Login Required