CroPond-3B

CroPond-3B is a vision-language model specialized in cross-view point correspondence. Built upon Qwen2.5-VL-3B-Instruct and mainly trained on the CrossPoint-378K dataset, CroPond achieves state-of-the-art performance on cross-view correspondence tasks.

Evaluation

For detailed evaluation instructions, please visit the GitHub repository.

Citation

@article{wang2025crosspoint,
  title={Towards Cross-View Point Correspondence in Vision-Language Models},
  author={Wang, Yipu and Ji, Yuheng and Liu, Yuyang and Zhou, Enshen and Yang, Ziqiang and Tian, Yuxuan and Qin, Ziheng and Liu, Yue and Tan, Huajie and Chi, Cheng and Ma, Zhiyuan and Zeng, Daniel Dajun and Zheng, Xiaolong},
  journal={arXiv preprint arXiv:2512.04686},
  year={2025}
}

Downloads last month: 5

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for WangYipu2002/CroPond-3B

Base model

Qwen/Qwen2.5-VL-3B-Instruct

Finetuned

(619)

this model

Quantizations

2 models

Paper for WangYipu2002/CroPond-3B

Towards Cross-View Point Correspondence in Vision-Language Models

Paper • 2512.04686 • Published Dec 4, 2025

CroPond-3B

Evaluation

Citation

Model tree for WangYipu2002/CroPond-3B

Paper for WangYipu2002/CroPond-3B

🎉 Free Image Generator Now Available!