ViewDelta: Text-Conditioned Scene Change Detection

ViewDelta is a generalized framework for Scene Change Detection (SCD) that uses natural language prompts to define what changes are relevant. Unlike traditional change detection methods that implicitly learn what constitutes a "relevant" change from dataset labels, ViewDelta allows users to explicitly specify at runtime what types of changes they care about through text prompts.

Overview

Given two images captured at different times and a text prompt describing the type of change to detect (e.g., "vehicle", "driveway", or "all changes"), ViewDelta produces a binary segmentation mask highlighting the relevant changes. The model is trained jointly on multiple datasets (CSeg, PSCD, SYSU-CD, VL-CMU-CD) and can:

Detect user-specified changes via natural language prompts
Handle unaligned image pairs with viewpoint variations
Generalize across diverse domains (street-view, satellite, indoor/outdoor scenes)
Detect all changes or specific semantic categories

For more details, see the paper: ViewDelta: Scaling Scene Change Detection through Text-Conditioning

Installation

Prerequisites

Note: ViewDelta has only been tested on Linux with the following specific versions:

Python 3.10
CUDA 12.1 (for GPU acceleration)
NVIDIA GPU (tested on RTX 4090, L40S, and A100 - other GPUs may work)
Pixi package manager

Clone Repository

git clone https://github.com/drags99/viewdelta-scd.git

Install Pixi

First, install the Pixi package manager:

# On Linux
curl -fsSL https://pixi.sh/install.sh | bash

For more installation options, visit: https://pixi.sh/latest/installation/

Install ViewDelta Dependencies

Once Pixi is installed, clone the repository and install dependencies:

pixi install

This will automatically set up the environment with all required dependencies including PyTorch, transformers, and other libraries.

Running the Model

Download ViewDelta Model Weights

wget https://huggingface.co/hoskerelab/ViewDelta/resolve/main/viewdelta_checkpoint.pth

Basic Usage

The repository includes an inference.py script for running the model on image pairs. Here's how to use it:

Prepare your images: Place two images you want to compare in the repository directory.
Download a pre-trained checkpoint: You'll need a model checkpoint file (e.g., model.pth).
Edit the inference script: Modify inference.py to specify your images and text prompt:

image_a_list = ["before_image.jpg"]
image_b_list = ["after_image.jpg"]
text_list = ["vehicle"]  # or "all" for all changes, or specific objects like "building", "tree", etc.

# Path to your checkpoint
PATH_TO_CHECKPOINT = "path/to/checkpoint.pth"

Run inference:

pixi run python inference.py

Output

The script generates several outputs:

{image_name}_mask_{text}.png: The binary segmentation mask
{image_name}_image_a_overlay.png: First image with changes highlighted

Text Prompt Examples

ViewDelta supports various types of text prompts:

Detect all changes: "What are the differences?", "Find any differences"
Specific objects: "vehicle", "building", "tree", "person"
Multiple objects: "vehicle, sign, barrier", "cars and pedestrians"
Natural language: "Has any construction equipment been added?", "What buildings have changed?"

Model Configuration

The model uses:

Text embeddings: SigLip (superior vision-language alignment)
Image embeddings: DINOv2 (frozen pretrained features)
Architecture: Vision Transformer (ViT) with 12 layers
Input resolution: Images are automatically resized to 256×256

Citation

@inproceedings{Varghese2024ViewDeltaSS,
  title={ViewDelta: Scaling Scene Change Detection through Text-Conditioning},
  author={Subin Varghese and Joshua Gao and Vedhus Hoskere},
  year={2024},
  url={https://api.semanticscholar.org/CorpusID:280642249}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

hoskerelab
/

ViewDelta