HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding Paper • 2404.13400 • Published Apr 20, 2024
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling Paper • 2410.08021 • Published Oct 10, 2024
CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding Paper • 2305.08685 • Published May 15, 2023
SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for Few-shot Image Classification Paper • 2211.16191 • Published Nov 28, 2022