Mustafa Acikgoz commited on
Commit
5a5ce51
Β·
1 Parent(s): 2e51bae

Add Hugging Face Space configuration

Browse files
Files changed (1) hide show
  1. README.md +10 -77
README.md CHANGED
@@ -1,77 +1,10 @@
1
- # CLIP-Style Image Search Engine (Textbook Implementation)
2
-
3
- This project provides a complete, modular, and end-to-end implementation of a CLIP-style model for text-to-image search. The architecture and training methodology are a faithful reproduction of the approach described in Chapter 14 of the textbook, "Building an Image Search Engine Using CLIP: a Multimodal Approach".
4
-
5
- The project is structured for clarity and maintainability, making it an ideal portfolio piece to showcase skills in PyTorch, model implementation, and MLOps practices like deployment with Streamlit and Hugging Face.
6
-
7
- ## Key Features
8
-
9
- - **Faithful "Book Version" Architecture:** Implements the specific design choices from the textbook:
10
- - **Frozen Vision Encoder:** Uses a pre-trained `ResNet50` as a fixed feature extractor.
11
- - **Frozen Text Encoder:** Uses a pre-trained `DistilBERT` as a fixed feature extractor.
12
- - **Projection Heads:** Maps both image and text features into a shared 256-dimensional space.
13
- - **Custom Contrastive Loss:** Implements the unique loss function described in the book.
14
- - **Modular & Professional Code Structure:** The code is separated into logical files (`config.py`, `dataset.py`, `model.py`, `train.py`, `app.py`) for better organization and scalability.
15
- - **End-to-End MLOps Pipeline:**
16
- - **Training:** A dedicated script to train the model and save the weights.
17
- - **Inference:** A standalone Streamlit web application for interactive text-to-image search.
18
- - **Hub Integration:** Detailed instructions for uploading the trained model and hosting the app on the Hugging Face Hub.
19
-
20
- ## Project Structure
21
- your-clip-project/
22
- β”‚
23
- β”œβ”€β”€ data/
24
- β”‚ β”œβ”€β”€ images/
25
- β”‚ └── captions.txt
26
- β”‚
27
- β”œβ”€β”€ app.py
28
- β”œβ”€β”€ config.py
29
- β”œβ”€β”€ dataset.py
30
- β”œβ”€β”€ model.py
31
- β”œβ”€β”€ train.py
32
- β”‚
33
- β”œβ”€β”€ requirements.txt
34
- └── README.md
35
-
36
-
37
- ## Setup and Installation
38
-
39
- **1. Clone the Repository:**
40
- ```bash
41
- git clone <your-repo-url>
42
- cd your-clip-project
43
- 2. Create a Python Virtual Environment:
44
-
45
- Bash
46
-
47
- python -m venv venv
48
- source venv/bin/activate # On Windows, use `venv\Scripts\activate`
49
- 3. Install Dependencies:
50
-
51
- Bash
52
-
53
- pip install -r requirements.txt
54
- 4. Download the Flickr8k Dataset:
55
-
56
- Request the dataset from the official source: https://illinois.edu/fb/sec/1713398.
57
-
58
- Download and extract Flickr8k_Dataset.zip into the data/images/ folder.
59
-
60
- Find a captions.txt file (commonly available on Kaggle versions of the dataset) and place it at data/captions.txt.
61
-
62
- How to Run
63
- Step 1: Train the Model
64
- First, you must train the model. This will create a clip_book_model.pth file containing the learned weights of the projection heads.
65
-
66
- Run the training script from your terminal:
67
-
68
- Bash
69
-
70
- python train.py
71
- Step 2: Launch the Web Application
72
- Once the model is trained, launch the interactive search engine with Streamlit:
73
-
74
- Bash
75
-
76
- streamlit run app.py
77
- This will open a new tab in your browser with the application running.
 
1
+ ---
2
+ title: CLIP Text-to-Image Search
3
+ emoji: πŸ–ΌοΈ
4
+ colorFrom: green
5
+ colorTo: blue
6
+ sdk: gradio
7
+ sdk_version: 4.19.2
8
+ app_file: app.py
9
+ pinned: false
10
+ ---