|
|
--- |
|
|
title: Geo Spatial Multi Vector Search |
|
|
emoji: π |
|
|
colorFrom: yellow |
|
|
colorTo: red |
|
|
sdk: streamlit |
|
|
sdk_version: 1.52.0 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
<h1> <center>Geo-Spatial Chat with Qdrant & ColPali</center> </h1> |
|
|
|
|
|
|
|
|
[](https://huggingface.co/spaces/mahimairaja/geo-spatial-multi-vector-search) |
|
|
[](https://qdrant.tech/) |
|
|
[](https://github.com/illuin-tech/colpali) |
|
|
|
|
|
Query geospatial burn scar data using natural language, powered by **ColPali (Vidore/colSmol-500M)** multi-vector embeddings and **Qdrant**. |
|
|
|
|
|
## β¨ Features |
|
|
|
|
|
- **Natural Language Search**: Ask questions like _"Find burn scars larger than 500 hectares in California"_. |
|
|
- **Multi-Vector Retrieval**: Uses `colpali-v1.2` (via `colSmol-500M`) for fine-grained patch-level image retrieval. |
|
|
- **Spatial Filtering**: |
|
|
- **Geocoding Dropdown**: Select US States or Canadian Provinces to automatically focus the search. |
|
|
- **Radius Search**: Filter results within a specified radius (km) of a location. |
|
|
- **Temporal Filtering**: Filter burn scars by acquisition date range. |
|
|
- **Interactive Map**: Visualize results on a Folium map with popups displaying score, area, and RGB imagery. |
|
|
- **Rich Results**: View top matches with confidence scores, metadata, and **Color (RGB)** imagery. |
|
|
|
|
|
## π Getting Started |
|
|
|
|
|
### Prerequisites |
|
|
|
|
|
- Python 3.10+ |
|
|
- Qdrant Instance (Local or Cloud) |
|
|
|
|
|
### Installation |
|
|
|
|
|
1. **Clone the repository:** |
|
|
```bash |
|
|
git clone https://github.com/mahimairaja/geo-spatial-chat-qdrant.git |
|
|
cd geo-spatial-chat-qdrant |
|
|
``` |
|
|
|
|
|
2. **Install dependencies:** |
|
|
Using `uv` (recommended): |
|
|
```bash |
|
|
uv pip install -r requirements.txt |
|
|
``` |
|
|
Or standard pip: |
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
3. **Environment Setup:** |
|
|
Create a `.env` file in the root directory: |
|
|
```env |
|
|
QDRANT_URL=your_qdrant_url |
|
|
QDRANT_API_KEY=your_qdrant_api_key |
|
|
HF_TOKEN=your_huggingface_token |
|
|
``` |
|
|
|
|
|
### Data Ingestion |
|
|
|
|
|
[](https://huggingface.co/datasets/mahimairaja/ibm-hls-burn-original) [](https://colab.research.google.com/drive/1yiXCy2WVvvJREhiL75r63Oul_MFjNNO9?usp=sharing) |
|
|
|
|
|
To ingest the dataset (HLS Burn Scars) into Qdrant: |
|
|
|
|
|
```bash |
|
|
python -m utils.ingest_to_qdrant |
|
|
``` |
|
|
_Note: This process generates ColPali embeddings and may take some time depending on your hardware (GPU recommended)._ |
|
|
|
|
|
|
|
|
|
|
|
### Running the App |
|
|
|
|
|
```bash |
|
|
streamlit run app.py |
|
|
``` |
|
|
|
|
|
## π οΈ Technology Stack |
|
|
|
|
|
- **Frontend**: [Streamlit](https://streamlit.io/) |
|
|
- **Vector Database**: [Qdrant](https://qdrant.tech/) |
|
|
- **Embedding Model**: [ColPali (Vidore/colSmol-500M)](https://huggingface.co/vidore/colSmol-500M) - Optimized for document/image retrieval using Idefics3 architecture. |
|
|
- **Map Visualization**: [Folium](https://python-visualization.github.io/folium/) & `streamlit-folium` |
|
|
- **Geocoding**: `geopy` (Nominatim API) |
|
|
|
|
|
|
|
|
If you are interested in dataset preparation, you can find it here: |
|
|
|
|
|
[](https://colab.research.google.com/drive/12KGJQ2UzQdaLIXbV258kh6tp9Ao7duVu?usp=sharing) |
|
|
|
|
|
## π License |
|
|
|
|
|
MIT License |
|
|
|