--- license: apache-2.0 tags: - image-feature-extraction - image-text-retrieval - multimodal - siglip - person-search datasets: - custom language: - en pipeline_tag: image-feature-extraction --- # 🔍 SigLIP Person Search - Open Set This model is a fine-tuned version of **`google/siglip-base-patch16-224`** for open-set **person retrieval** based on **natural language descriptions**. It's built to support **image-text similarity** in real-world retail and surveillance scenarios. ## 🧠 Use Case This model allows you to search for people in crowded environments (like malls or stores) using only a **text prompt**, for example: > "A man wearing a white t-shirt and carrying a brown shoulder bag" The model will return person crops that match the description. ## 💾 Training * Base: `google/siglip-base-patch16-224` * Loss: Cosine InfoNCE * Data: ReID dataset with multimodal attributes (generated via Gemini) * Epochs: 10 * Usage: Retrieval-style search (not classification) ## 📈 Intended Use * Smart surveillance * Anonymous retail behavior tracking * Human-in-the-loop retrieval * Visual search & retrieval systems ## 🔧 How to Use ```python from transformers import AutoProcessor, AutoModel import torch processor = AutoProcessor.from_pretrained("adonaivera/siglip-person-search-openset") model = AutoModel.from_pretrained("adonaivera/siglip-person-search-openset") text = "A man wearing a white t-shirt and carrying a brown shoulder bag" inputs = processor(text=text, return_tensors="pt") with torch.no_grad(): text_features = model.get_text_features(**inputs) ``` ## 📌 Notes * This model is optimized for **feature extraction** and **cosine similarity matching** * It's not meant for classification or image generation * Similarity threshold tuning is required depending on your application