Issue with ColPali: Incorrect Page Retrieval for Fund-specific Queries
Hi ColPali team,
I’m using ColPali on fund PDF of a company, but fund-specific queries often return the wrong page. It tends to:
Prefer pages where the fund name appears most frequently, or
Match high-frequency query words rather than true semantic context.
This leads to irrelevant pages when multiple funds share similar terms. Although vidore/colpali-v1.3 is said to use ColBERT, queries are not being contextualized to the correct pages. How can we improve semantic matching so ColPali retrieves the most contextually relevant page embeddings for fund-specific queries?
Try colqwen first and see if it improves ?
I have tried copali and colqwen both, On a set of 10 specific queries, they both were able to retrieve top pages for 8 queries. However i considered top 5 pages to solve the remaining 2 queries. colpali was able to retrieve the page required in the top 5..
It's hard to guarantee perfect performance...
Finetuning on domain specific data might help.
Reranking the top 10 results using a big proprietary model such as GPT5 might also help, and can get you a textual response
Thanks!