Spaces:
Runtime error
Runtime error
Mustafa Acikgoz
commited on
Commit
Β·
5a5ce51
1
Parent(s):
2e51bae
Add Hugging Face Space configuration
Browse files
README.md
CHANGED
|
@@ -1,77 +1,10 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
- **Frozen Text Encoder:** Uses a pre-trained `DistilBERT` as a fixed feature extractor.
|
| 12 |
-
- **Projection Heads:** Maps both image and text features into a shared 256-dimensional space.
|
| 13 |
-
- **Custom Contrastive Loss:** Implements the unique loss function described in the book.
|
| 14 |
-
- **Modular & Professional Code Structure:** The code is separated into logical files (`config.py`, `dataset.py`, `model.py`, `train.py`, `app.py`) for better organization and scalability.
|
| 15 |
-
- **End-to-End MLOps Pipeline:**
|
| 16 |
-
- **Training:** A dedicated script to train the model and save the weights.
|
| 17 |
-
- **Inference:** A standalone Streamlit web application for interactive text-to-image search.
|
| 18 |
-
- **Hub Integration:** Detailed instructions for uploading the trained model and hosting the app on the Hugging Face Hub.
|
| 19 |
-
|
| 20 |
-
## Project Structure
|
| 21 |
-
your-clip-project/
|
| 22 |
-
β
|
| 23 |
-
βββ data/
|
| 24 |
-
β βββ images/
|
| 25 |
-
β βββ captions.txt
|
| 26 |
-
β
|
| 27 |
-
βββ app.py
|
| 28 |
-
βββ config.py
|
| 29 |
-
βββ dataset.py
|
| 30 |
-
βββ model.py
|
| 31 |
-
βββ train.py
|
| 32 |
-
β
|
| 33 |
-
βββ requirements.txt
|
| 34 |
-
βββ README.md
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
## Setup and Installation
|
| 38 |
-
|
| 39 |
-
**1. Clone the Repository:**
|
| 40 |
-
```bash
|
| 41 |
-
git clone <your-repo-url>
|
| 42 |
-
cd your-clip-project
|
| 43 |
-
2. Create a Python Virtual Environment:
|
| 44 |
-
|
| 45 |
-
Bash
|
| 46 |
-
|
| 47 |
-
python -m venv venv
|
| 48 |
-
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
|
| 49 |
-
3. Install Dependencies:
|
| 50 |
-
|
| 51 |
-
Bash
|
| 52 |
-
|
| 53 |
-
pip install -r requirements.txt
|
| 54 |
-
4. Download the Flickr8k Dataset:
|
| 55 |
-
|
| 56 |
-
Request the dataset from the official source: https://illinois.edu/fb/sec/1713398.
|
| 57 |
-
|
| 58 |
-
Download and extract Flickr8k_Dataset.zip into the data/images/ folder.
|
| 59 |
-
|
| 60 |
-
Find a captions.txt file (commonly available on Kaggle versions of the dataset) and place it at data/captions.txt.
|
| 61 |
-
|
| 62 |
-
How to Run
|
| 63 |
-
Step 1: Train the Model
|
| 64 |
-
First, you must train the model. This will create a clip_book_model.pth file containing the learned weights of the projection heads.
|
| 65 |
-
|
| 66 |
-
Run the training script from your terminal:
|
| 67 |
-
|
| 68 |
-
Bash
|
| 69 |
-
|
| 70 |
-
python train.py
|
| 71 |
-
Step 2: Launch the Web Application
|
| 72 |
-
Once the model is trained, launch the interactive search engine with Streamlit:
|
| 73 |
-
|
| 74 |
-
Bash
|
| 75 |
-
|
| 76 |
-
streamlit run app.py
|
| 77 |
-
This will open a new tab in your browser with the application running.
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: CLIP Text-to-Image Search
|
| 3 |
+
emoji: πΌοΈ
|
| 4 |
+
colorFrom: green
|
| 5 |
+
colorTo: blue
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: 4.19.2
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|