adonaivera
/

siglip-person-search-openset

Image Feature Extraction

image-text-retrieval

Model card Files Files and versions

siglip-person-search-openset / README.md

adonaivera's picture

Initial upload of fine-tuned SigLIP model

0592343 verified 6 months ago

|

history blame contribute delete

1.85 kB

	---
	license: apache-2.0
	tags:
	- image-feature-extraction
	- image-text-retrieval
	- multimodal
	- siglip
	- person-search
	datasets:
	- custom
	language:
	- en
	pipeline_tag: image-feature-extraction
	---

	# 🔍 SigLIP Person Search - Open Set

	This model is a fine-tuned version of `google/siglip-base-patch16-224` for open-set person retrieval based on natural language descriptions. It's built to support image-text similarity in real-world retail and surveillance scenarios.

	## 🧠 Use Case

	This model allows you to search for people in crowded environments (like malls or stores) using only a text prompt, for example:

	> "A man wearing a white t-shirt and carrying a brown shoulder bag"

	The model will return person crops that match the description.

	## 💾 Training

	* Base: `google/siglip-base-patch16-224`
	* Loss: Cosine InfoNCE
	* Data: ReID dataset with multimodal attributes (generated via Gemini)
	* Epochs: 10
	* Usage: Retrieval-style search (not classification)

	## 📈 Intended Use

	* Smart surveillance
	* Anonymous retail behavior tracking
	* Human-in-the-loop retrieval
	* Visual search & retrieval systems

	## 🔧 How to Use

	```python
	from transformers import AutoProcessor, AutoModel
	import torch

	processor = AutoProcessor.from_pretrained("adonaivera/siglip-person-search-openset")
	model = AutoModel.from_pretrained("adonaivera/siglip-person-search-openset")

	text = "A man wearing a white t-shirt and carrying a brown shoulder bag"
	inputs = processor(text=text, return_tensors="pt")
	with torch.no_grad():
	text_features = model.get_text_features(**inputs)
	```

	## 📌 Notes

	* This model is optimized for feature extraction and cosine similarity matching
	* It's not meant for classification or image generation
	* Similarity threshold tuning is required depending on your application