ColPali: Efficient Document Retrieval with Vision Language Models
Paper
β’
2407.01449
β’
Published
β’
51
Due to my limited computing resources, I can currently only conduct some interesting experiments with 2B/4B models.πππ

This version should not be used: it is solely the base version useful for deterministic LoRA initialization.
This model is built by merging Qwen/Qwen3-VL-2B-Instruct with Qwen3-VL-Embedding-2B
β€οΈβ€οΈβ€οΈ
Thanks to the Colpali team and Qwen team for their excellent open-source works! I accomplished this work by standing on the shoulders of giants~
If you use any datasets or models from this organization in your research, please cite the original dataset as follows:
@misc{faysse2024colpaliefficientdocumentretrieval,
title={ColPali: Efficient Document Retrieval with Vision Language Models},
author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and CΓ©line Hudelot and Pierre Colombo},
year={2024},
eprint={2407.01449},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2407.01449},
}
Base model
Qwen/Qwen3-VL-2B-Instruct