Apply for community grant: Academic project (gpu)

#1
by lorebianchi98 - opened

This is the official HuggingFace Space of the ICCV 2025 paper "Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation".

Talk2DINO is an open-vocabulary segmentation architecture that combines the localized and semantically rich patch-level features of DINOv2 with the multimodal understanding capabilities of CLIP. This is achieved by learning a projection from the CLIP text encoder to the embedding space of DINOv2 using only image-caption pairs and exploiting the self-attention properties of DINOv2 to understand which part of the image has to be aligned to the corresponding caption.

image

Hi @lorebianchi98 , we've assigned ZeroGPU to this Space. Please check the compatibility and usage sections of this page so your Space can run on ZeroGPU.
If you can, we ask that you upgrade to Pro ($9/month) to enjoy higher ZeroGPU quota and other features like Dev Mode, Private Storage, and more: hf.co/pro

Sign up or log in to comment