borso271 commited on
Commit
c190603
Β·
1 Parent(s): b48deab

Enhance README with comprehensive documentation

Browse files

- Add detailed architecture overview and API examples
- Include admin operations guide and Hub integration docs
- Add performance optimization and security notes

Files changed (1) hide show
  1. README.md +168 -26
README.md CHANGED
@@ -12,42 +12,184 @@ license: mit
12
 
13
  # πŸ“Έ MobileCLIP-B Image Classifier
14
 
15
- Interactive web interface for Apple's MobileCLIP-B zero-shot image classification model.
16
 
17
- ## Features
18
 
19
- - πŸ–ΌοΈ **Image Classification**: Upload any image for instant classification
20
- - 🏷️ **Dynamic Labels**: Add and manage classification labels on-the-fly
21
- - πŸ“Š **Visual Results**: See confidence scores with interactive charts
22
- - ⚑ **Fast Inference**: Optimized for < 30ms latency on GPU
23
- - πŸ”’ **Admin Panel**: Secure label management interface
 
 
24
 
25
- ## Environment Variables
 
 
 
26
 
27
- Configure these in your Space Settings (Settings β†’ Variables and secrets):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  | Variable | Description | Required |
30
  |----------|-------------|----------|
31
- | `ADMIN_TOKEN` | Secret token for admin operations | Yes (for admin features) |
32
- | `HF_LABEL_REPO` | Hub dataset repo for label storage (e.g., `username/mobileclip-labels`) | No |
33
- | `HF_WRITE_TOKEN` | Hugging Face token with write permissions | No |
34
- | `HF_READ_TOKEN` | Hugging Face token with read permissions | No |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
- ## How It Works
37
 
38
- 1. **Model**: Uses MobileCLIP-B with re-parameterized MobileOne blocks for efficient inference
39
- 2. **Labels**: Loads from `items.json` or dynamically from Hub repository
40
- 3. **Processing**: Pre-computes text embeddings for fast classification
41
- 4. **Interface**: Gradio provides the web UI with image upload and admin controls
42
 
43
- ## Admin Features
44
 
45
- With proper authentication, admins can:
46
- - Add new classification labels without redeploying
47
- - Reload specific label versions from the Hub
48
- - View current statistics and label information
 
49
 
50
- ## License
51
 
52
- - Model weights: Apple Sample Code License (ASCL)
53
- - Interface code: MIT
 
 
 
12
 
13
  # πŸ“Έ MobileCLIP-B Image Classifier
14
 
15
+ Zero-shot image classification powered by Apple's MobileCLIP-B model, served through an interactive Gradio web interface. This application enables real-time image classification against a dynamic set of text labels, with support for admin-managed label updates and optional Hugging Face Hub persistence.
16
 
17
+ ## 🎯 Key Features
18
 
19
+ ### Core Capabilities
20
+ - **πŸ–ΌοΈ Zero-Shot Classification**: Upload any image for instant classification without model retraining
21
+ - **🏷️ Dynamic Label Management**: Add, remove, and update classification labels on-the-fly
22
+ - **πŸ“Š Interactive Results**: Visual confidence scores with sortable data tables
23
+ - **⚑ Optimized Performance**: Sub-30ms inference on GPU with re-parameterized MobileOne blocks
24
+ - **πŸ”’ Secure Admin Panel**: Token-protected label management interface
25
+ - **☁️ Hub Persistence**: Optional versioned label storage on Hugging Face Hub
26
 
27
+ ### API Access
28
+ - **REST API**: Fully accessible via Gradio's automatic API endpoints
29
+ - **Base64 Support**: Direct base64 image input for backend integration
30
+ - **Batch Processing**: Efficient handling of multiple classification requests
31
 
32
+ ## πŸ—οΈ Architecture
33
+
34
+ ### Components
35
+ - **`app.py`**: Main Gradio interface with public/admin tabs and API endpoints
36
+ - **`handler.py`**: Core model management, inference logic, and label operations
37
+ - **`reparam.py`**: MobileOne re-parameterization for optimized inference
38
+ - **`items.json`**: Default label catalog with metadata
39
+
40
+ ### Model Details
41
+ - **Architecture**: MobileCLIP-B with re-parameterized MobileOne image encoder
42
+ - **Text Encoder**: Optimized CLIP text transformer
43
+ - **Embedding Cache**: Pre-computed text embeddings for fast inference
44
+ - **Device Support**: Automatic GPU/CPU detection with float16 optimization
45
+
46
+ ## πŸš€ Quick Start
47
+
48
+ ### Environment Variables
49
+
50
+ Configure in your Space Settings β†’ Variables and secrets:
51
 
52
  | Variable | Description | Required |
53
  |----------|-------------|----------|
54
+ | `ADMIN_TOKEN` | Secret token for admin operations | Yes (for admin) |
55
+ | `HF_LABEL_REPO` | Hub dataset for label storage (e.g., `user/labels`) | No |
56
+ | `HF_WRITE_TOKEN` | Token with write permissions to dataset repo | No |
57
+ | `HF_READ_TOKEN` | Token with read permissions (defaults to write token) | No |
58
+
59
+ ### Usage Examples
60
+
61
+ #### Web Interface
62
+ 1. Navigate to the Space URL
63
+ 2. Upload an image in the Classification tab
64
+ 3. Adjust top-k results (default: 10)
65
+ 4. View ranked predictions with confidence scores
66
+
67
+ #### API Usage
68
+
69
+ **Standard Classification:**
70
+ ```python
71
+ import requests
72
+
73
+ response = requests.post(
74
+ "YOUR_SPACE_URL/api/classify_image",
75
+ files={"image": open("photo.jpg", "rb")},
76
+ data={"top_k": 5}
77
+ )
78
+ results = response.json()
79
+ ```
80
+
81
+ **Base64 Input:**
82
+ ```python
83
+ import base64
84
+ import requests
85
+
86
+ with open("photo.jpg", "rb") as f:
87
+ img_base64 = base64.b64encode(f.read()).decode()
88
+
89
+ response = requests.post(
90
+ "YOUR_SPACE_URL/api/classify_base64",
91
+ json={
92
+ "image": img_base64,
93
+ "top_k": 10
94
+ }
95
+ )
96
+ results = response.json()
97
+ ```
98
+
99
+ ## πŸ”§ Admin Operations
100
+
101
+ ### Label Management
102
+
103
+ Authenticated admins can perform the following operations:
104
+
105
+ #### Add Labels
106
+ ```json
107
+ {
108
+ "op": "upsert_labels",
109
+ "token": "YOUR_ADMIN_TOKEN",
110
+ "items": [
111
+ {"id": 100, "name": "bicycle", "prompt": "a photo of a bicycle"},
112
+ {"id": 101, "name": "airplane", "prompt": "a photo of an airplane"}
113
+ ]
114
+ }
115
+ ```
116
+
117
+ #### Reload Specific Version
118
+ ```json
119
+ {
120
+ "op": "reload_labels",
121
+ "token": "YOUR_ADMIN_TOKEN",
122
+ "version": 5
123
+ }
124
+ ```
125
+
126
+ #### Remove Labels
127
+ ```json
128
+ {
129
+ "op": "remove_labels",
130
+ "token": "YOUR_ADMIN_TOKEN",
131
+ "ids": [100, 101]
132
+ }
133
+ ```
134
+
135
+ ### Label Deduplication
136
+ - Automatic case-insensitive name deduplication
137
+ - Prevents duplicate entries (e.g., "cat", "Cat", "CAT" treated as same)
138
+ - ID-based deduplication for consistent label management
139
+
140
+ ## πŸ“¦ Hub Integration
141
+
142
+ When configured with `HF_LABEL_REPO` and tokens, the system automatically:
143
+
144
+ 1. **Saves Snapshots**: Each label update creates versioned snapshots
145
+ - `snapshots/v{N}/embeddings.safetensors`: Pre-computed text embeddings
146
+ - `snapshots/v{N}/meta.json`: Label metadata and model info
147
+ - `snapshots/latest.json`: Points to current version
148
+
149
+ 2. **Loads on Startup**: Fetches latest snapshot or specified version
150
+ 3. **Fallback**: Uses local `items.json` if Hub unavailable
151
+
152
+ ## 🎨 Default Label Catalog
153
+
154
+ The bundled `items.json` includes 50+ kid-friendly objects with:
155
+ - Unique IDs and display names
156
+ - CLIP-optimized prompts
157
+ - Category metadata
158
+ - Fun facts and rarity ratings
159
+
160
+ Categories include animals, toys, food, vehicles, nature, and everyday objects.
161
+
162
+ ## ⚑ Performance Optimization
163
+
164
+ - **GPU Acceleration**: Automatic CUDA detection with float16 inference
165
+ - **CPU Fallback**: Graceful degradation with float32 precision
166
+ - **Embedding Cache**: Pre-computed text embeddings updated on label changes
167
+ - **Re-parameterization**: MobileOne blocks optimized for inference speed
168
+ - **Batch Processing**: Efficient matrix operations for multi-label scoring
169
+
170
+ ## πŸ” Security Considerations
171
+
172
+ - **Token Protection**: Admin operations require `ADMIN_TOKEN`
173
+ - **Private Datasets**: Keep label repos private for sensitive applications
174
+ - **Input Validation**: Automatic sanitization of uploaded images
175
+ - **Memory Management**: Images processed and discarded after inference
176
 
177
+ ## πŸ“„ License
178
 
179
+ - **Model Weights**: Apple Sample Code License (ASCL)
180
+ - **Interface Code**: MIT License
 
 
181
 
182
+ ## 🀝 Contributing
183
 
184
+ Contributions welcome! Areas for improvement:
185
+ - Additional label management features
186
+ - Performance optimizations
187
+ - Extended API capabilities
188
+ - Multi-language support
189
 
190
+ ## πŸ“š Resources
191
 
192
+ - [MobileCLIP Paper](https://arxiv.org/abs/2311.17049)
193
+ - [OpenCLIP Library](https://github.com/mlfoundations/open_clip)
194
+ - [Gradio Documentation](https://gradio.app/docs)
195
+ - [Hugging Face Spaces](https://huggingface.co/spaces)