adonaivera's picture
Initial upload of fine-tuned SigLIP model
0592343 verified
---
license: apache-2.0
tags:
- image-feature-extraction
- image-text-retrieval
- multimodal
- siglip
- person-search
datasets:
- custom
language:
- en
pipeline_tag: image-feature-extraction
---
# πŸ” SigLIP Person Search - Open Set
This model is a fine-tuned version of **`google/siglip-base-patch16-224`** for open-set **person retrieval** based on **natural language descriptions**. It's built to support **image-text similarity** in real-world retail and surveillance scenarios.
## 🧠 Use Case
This model allows you to search for people in crowded environments (like malls or stores) using only a **text prompt**, for example:
> "A man wearing a white t-shirt and carrying a brown shoulder bag"
The model will return person crops that match the description.
## πŸ’Ύ Training
* Base: `google/siglip-base-patch16-224`
* Loss: Cosine InfoNCE
* Data: ReID dataset with multimodal attributes (generated via Gemini)
* Epochs: 10
* Usage: Retrieval-style search (not classification)
## πŸ“ˆ Intended Use
* Smart surveillance
* Anonymous retail behavior tracking
* Human-in-the-loop retrieval
* Visual search & retrieval systems
## πŸ”§ How to Use
```python
from transformers import AutoProcessor, AutoModel
import torch
processor = AutoProcessor.from_pretrained("adonaivera/siglip-person-search-openset")
model = AutoModel.from_pretrained("adonaivera/siglip-person-search-openset")
text = "A man wearing a white t-shirt and carrying a brown shoulder bag"
inputs = processor(text=text, return_tensors="pt")
with torch.no_grad():
text_features = model.get_text_features(**inputs)
```
## πŸ“Œ Notes
* This model is optimized for **feature extraction** and **cosine similarity matching**
* It's not meant for classification or image generation
* Similarity threshold tuning is required depending on your application