---
license: apache-2.0
tags:
  - image-feature-extraction
  - image-text-retrieval
  - multimodal
  - siglip
  - person-search
datasets:
  - custom
language:
  - en
pipeline_tag: image-feature-extraction
---

# 🔍 SigLIP Person Search - Open Set

This model is a fine-tuned version of **`google/siglip-base-patch16-224`** for open-set **person retrieval** based on **natural language descriptions**. It's built to support **image-text similarity** in real-world retail and surveillance scenarios.

## 🧠 Use Case

This model allows you to search for people in crowded environments (like malls or stores) using only a **text prompt**, for example:

> "A man wearing a white t-shirt and carrying a brown shoulder bag"

The model will return person crops that match the description.

## 💾 Training

* Base: `google/siglip-base-patch16-224`
* Loss: Cosine InfoNCE
* Data: ReID dataset with multimodal attributes (generated via Gemini)
* Epochs: 10
* Usage: Retrieval-style search (not classification)

## 📈 Intended Use

* Smart surveillance
* Anonymous retail behavior tracking
* Human-in-the-loop retrieval
* Visual search & retrieval systems

## 🔧 How to Use

```python
from transformers import AutoProcessor, AutoModel
import torch

processor = AutoProcessor.from_pretrained("adonaivera/siglip-person-search-openset")
model = AutoModel.from_pretrained("adonaivera/siglip-person-search-openset")

text = "A man wearing a white t-shirt and carrying a brown shoulder bag"
inputs = processor(text=text, return_tensors="pt")
with torch.no_grad():
    text_features = model.get_text_features(**inputs)
```

## 📌 Notes

* This model is optimized for **feature extraction** and **cosine similarity matching**
* It's not meant for classification or image generation
* Similarity threshold tuning is required depending on your application