MJaheen commited on
Commit
cc5958e
·
1 Parent(s): fb609fe

Add new features and fixes

Browse files

- Fix some issues , add doucmention

LICENSE CHANGED
@@ -1,6 +1,6 @@
1
  MIT License
2
 
3
- Copyright (c) 2025 MJaheen
4
 
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
  of this software and associated documentation files (the "Software"), to deal
 
1
  MIT License
2
 
3
+ Copyright (c) 2025 MJaheen , [email protected]
4
 
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
  of this software and associated documentation files (the "Software"), to deal
README.md CHANGED
@@ -9,109 +9,380 @@ app_file: src/app.py
9
  python_version: "3.11"
10
  ---
11
 
12
- # 🐸 Pepe the Frog Meme Generator
13
 
14
- AI-powered meme generator using Stable Diffusion and LoRA fine-tuning.
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  ---
17
 
18
- ## 🎮 Try It Online
19
 
20
- 🚀 **[Open in Hugging Face Spaces](https://huggingface.co/spaces/MJaheen/Pepe-Meme-Generator)**
 
 
 
 
 
 
 
 
 
 
21
 
22
  ---
23
 
24
- ## 🌟 Features
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
- - **Multiple Model Support**: Switch between fine-tuned LoRA and base models
27
- - Pepe Fine-tuned (LoRA) - Custom trained model
28
- - Base SD 1.5 - Standard Stable Diffusion
29
- - Dreamlike Photoreal 2.0 - Photorealistic style
30
- - Openjourney v4 - Artistic Midjourney-style
31
  - **Raw Prompt Mode**: Use exact prompts without automatic enhancements
32
- - Generate **custom Pepe memes** from text prompts
33
- - Multiple **style presets** (happy, sad, smug, angry, etc.)
34
- - **Add meme text overlays** with automatic "MJ" signature
35
- - **Real-time progress tracking** for each generation step
36
- - Adjustable generation parameters (CFG, steps, seed, etc.)
37
- - Batch generation and meme gallery system
38
- - **GPU & CPU compatible** with automatic optimization
 
 
 
 
 
39
 
40
  ---
41
 
42
- ## 💡 Example Prompts
43
 
44
- - "pepe the frog as a wizard"
45
- - "pepe coding on a laptop"
46
- - "pepe drinking coffee"
47
- - "smug pepe wearing sunglasses"
48
 
49
- ---
50
 
51
- ## 🚀 Quick Start (GitHub)
52
 
53
  ```bash
54
- # Clone
55
  git clone https://github.com/YOUR_USERNAME/pepe-meme-generator.git
56
  cd pepe-meme-generator
57
 
58
- # Install
 
 
 
 
59
  pip install -r requirements.txt
60
 
61
- # Run
62
  streamlit run src/app.py
63
  ```
64
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65
 
66
  ---
67
 
68
- ## 📚 Project Structure
69
 
 
70
  pepe-meme-generator/
71
- ├── src/
72
- │ ├── app.py # Main Streamlit app
73
- │ ├── model/
74
- │ │ ├── generator.py # Generation logic
75
- │ │ └── config.py # Model configuration
76
- │ └── utils/
77
- └── image_processor.py
78
- ├── models/ # Model weights (not committed)
79
- ├── outputs/ # Generated memes (not committed)
80
- ├── requirements.txt
81
- ├── .gitignore
82
- └── README.md
 
 
 
 
 
 
 
 
 
 
 
83
 
84
  ---
85
 
86
- ## 🛠️ Tech Stack
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
 
88
- Model: Stable Diffusion 1.5 + LoRA
89
 
90
- Framework: PyTorch, Diffusers
 
 
 
 
 
91
 
92
- UI: Streamlit
 
 
 
 
93
 
94
- Processing: PIL, OpenCV
 
 
 
95
 
96
  ---
97
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
98
 
 
 
 
99
 
100
- This project demonstrates:
101
- - Diffusion model architecture
102
- - Transfer learning with LoRA
103
- - Text-to-image synthesis
104
 
105
  ---
106
- ## 🎓 🙏 Acknowledgments
107
 
108
- - [WorldQuant](https://www.wqu.edu/ai-lab-computer-vision)
109
- - [Stable Diffusion](https://github.com/CompVis/stable-diffusion)
110
- - [LoRA](https://github.com/microsoft/LoRA)
111
- - [Diffusers](https://github.com/huggingface/diffusers)
112
- - [Streamlit](https://github.com/streamlit/streamlit)
 
113
 
 
 
 
 
 
 
 
 
 
 
 
114
 
115
  ## 📜 License
116
 
117
- MIT License see LICENSE file.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  python_version: "3.11"
10
  ---
11
 
12
+ <div align="center">
13
 
14
+ # 🐸 Pepe the Frog AI Meme Generator
15
+
16
+ ### Create custom Pepe memes using AI-powered Stable Diffusion with LoRA fine-tuning
17
+
18
+ [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
19
+ [![Streamlit](https://img.shields.io/badge/Streamlit-1.28+-FF4B4B.svg)](https://streamlit.io)
20
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
21
+ [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97-Hugging%20Face-orange)](https://huggingface.co/MJaheen/Pepe_The_Frog_model_v1_lora)
22
+
23
+ [Demo](https://huggingface.co/spaces/MJaheen/Pepe-Meme-Generator) • [Documentation](./docs/) • [Training Guide](./docs/TRAINING.md) • [Report Bug](https://github.com/YOUR_USERNAME/pepe-meme-generator/issues)
24
+
25
+ </div>
26
 
27
  ---
28
 
29
+ ## 📖 Table of Contents
30
 
31
+ - [Features](#-features)
32
+ - [Quick Start](#-quick-start)
33
+ - [Installation](#-installation)
34
+ - [Usage](#-usage)
35
+ - [Model Information](#-model-information)
36
+ - [Performance Optimization](#-performance-optimization)
37
+ - [Project Structure](#-project-structure)
38
+ - [Training](#-training-your-own-model)
39
+ - [Contributing](#-contributing)
40
+ - [License](#-license)
41
+ - [Acknowledgments](#-acknowledgments)
42
 
43
  ---
44
 
45
+ ## Features
46
+
47
+ ### 🎨 **Multiple AI Models**
48
+ - **Pepe Fine-tuned LoRA** - Custom trained on Pepe dataset (1600 steps)
49
+ - **Pepe + LCM (FAST)** - 8x faster generation with LCM technology
50
+ - **Tiny SD** - Lightweight model for faster CPU generation
51
+ - **Small SD** - Balanced speed and quality
52
+ - **Base SD 1.5** - Standard Stable Diffusion
53
+ - **Dreamlike Photoreal 2.0** - Photorealistic style
54
+ - **Openjourney v4** - Artistic Midjourney-inspired style
55
+
56
+ ### ⚡ **Performance Features**
57
+ - **LCM Support**: Generate images in 6 steps (~30 seconds on CPU)
58
+ - **GPU Acceleration**: Automatic CUDA detection with xformers support
59
+ - **Memory Efficient**: Attention slicing and VAE slicing enabled
60
 
61
+ ### 🎭 **Generation Features**
62
+ - **Style Presets**: Happy, sad, smug, angry, crying, and more
 
 
 
63
  - **Raw Prompt Mode**: Use exact prompts without automatic enhancements
64
+ - **Text Overlays**: Add meme text with Impact font
65
+ - **Batch Generation**: Create multiple variations
66
+ - **Progress Tracking**: Real-time generation progress bar
67
+ - **Seed Control**: Reproducible generations with fixed seeds
68
+ - **Gallery System**: View and manage all generated memes
69
+
70
+ ### 🎯 **User Experience**
71
+ - **Model Hot-Swapping**: Switch models without restart
72
+ - **Interactive UI**: Clean Streamlit interface
73
+ - **Example Prompts**: Built-in inspiration gallery
74
+ - **Download Support**: Save images with one click
75
+ - **Responsive Design**: Works on desktop and mobile
76
 
77
  ---
78
 
79
+ ## 🚀 Quick Start
80
 
81
+ ### Try Online (No Installation)
 
 
 
82
 
83
+ 🌐 **[Open in Hugging Face Spaces](https://huggingface.co/spaces/MJaheen/Pepe-Meme-Generator)** - Run instantly in your browser!
84
 
85
+ ### Local Installation
86
 
87
  ```bash
88
+ # 1. Clone the repository
89
  git clone https://github.com/YOUR_USERNAME/pepe-meme-generator.git
90
  cd pepe-meme-generator
91
 
92
+ # 2. Create virtual environment (recommended)
93
+ python -m venv venv
94
+ source venv/bin/activate # On Windows: venv\Scripts\activate
95
+
96
+ # 3. Install dependencies
97
  pip install -r requirements.txt
98
 
99
+ # 4. Run the app
100
  streamlit run src/app.py
101
  ```
102
 
103
+ The app will open in your browser at `http://localhost:8501`
104
+
105
+ ---
106
+
107
+ ## 📦 Installation
108
+
109
+ ### System Requirements
110
+
111
+ - **Python**: 3.10 or higher
112
+ - **RAM**: 8GB minimum, 16GB recommended
113
+ - **GPU**: Optional (NVIDIA with CUDA for faster generation)
114
+ - **Storage**: ~5GB for models and dependencies
115
+
116
+ ### Dependencies
117
+
118
+ ```bash
119
+ # Core dependencies
120
+ pip install torch torchvision # PyTorch
121
+ pip install diffusers transformers accelerate # Diffusion models
122
+ pip install streamlit # Web interface
123
+ pip install pillow numpy scipy # Image processing
124
+ pip install peft safetensors # LoRA support
125
+ ```
126
+
127
+ Or install everything at once:
128
+
129
+ ```bash
130
+ pip install -r requirements.txt
131
+ ```
132
+
133
+ ### GPU Setup (Optional but Recommended)
134
+
135
+ For NVIDIA GPUs with CUDA:
136
+
137
+ ```bash
138
+ # Install PyTorch with CUDA support
139
+ pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
140
+
141
+ # Install xformers for memory-efficient attention
142
+ pip install xformers
143
+ ```
144
+
145
+ ---
146
+
147
+ ## 🎮 Usage
148
+
149
+ ### Basic Usage
150
+
151
+ 1. **Select a Model**: Choose from the dropdown (try "Pepe + LCM (FAST)" for speed)
152
+ 2. **Enter a Prompt**: e.g., "pepe the frog as a wizard casting spells"
153
+ 3. **Adjust Settings**: Steps (6 for LCM, 25 for normal), guidance scale, etc.
154
+ 4. **Generate**: Click "Generate Meme" and wait
155
+ 5. **Download**: Save your creation!
156
+
157
+ ### Example Prompts
158
+
159
+ ```
160
+ pepe_style_frog, wizard casting magical spells, detailed
161
+ pepe_style_frog, programmer coding on laptop, cyberpunk style
162
+ pepe_style_frog, drinking coffee at sunrise, peaceful
163
+ pepe_style_frog, wearing sunglasses, smug expression
164
+ pepe_style_frog, crying with rain, emotional, dramatic lighting
165
+ ```
166
+
167
+ ### Advanced Features
168
+
169
+ #### **Using LCM for Fast Generation**
170
+ 1. Select "Pepe + LCM (FAST)" model
171
+ 2. Use 6 steps (optimal for LCM)
172
+ 3. Set guidance scale to 1.5
173
+ 4. Generate in ~30 seconds!
174
+
175
+ #### **Adding Text Overlays**
176
+ 1. Expand "Add Text" section
177
+ 2. Enter top and bottom text
178
+ 3. Text automatically styled in Impact font
179
+ 4. Signature "MJ" added to corner
180
+
181
+ #### **Reproducible Generations**
182
+ 1. Enable "Fixed Seed" in Advanced Settings
183
+ 2. Set a seed number (e.g., 42)
184
+ 3. Same seed + prompt = same image
185
+
186
+ ---
187
+
188
+ ## 🤖 Model Information
189
+
190
+ ### Fine-Tuned LoRA Model
191
+
192
+ **Model ID**: `MJaheen/Pepe_The_Frog_model_v1_lora`
193
+
194
+ **Training Details**:
195
+ - **Base Model**: Stable Diffusion v1.5
196
+ - **Method**: LoRA (Low-Rank Adaptation)
197
+ - **Dataset**: [iresidentevil/pepe_the_frog](https://huggingface.co/datasets/iresidentevil/pepe_the_frog)
198
+ - **Training Steps**: 2000
199
+ - **Resolution**: 512x512
200
+ - **Batch Size**: 1 (4 gradient accumulation)
201
+ - **Learning Rate**: 1e-4 (cosine schedule)
202
+ - **LoRA Rank**: 16
203
+ - **Precision**: Mixed FP16
204
+ - **Trigger Word**: `pepe_style_frog`
205
+
206
+ **Performance**:
207
+ - Quality: ⭐⭐⭐ (Good)
208
+ - Speed (CPU): ~4 minutes (25 steps)
209
+ - Speed (GPU): ~15 seconds (25 steps)
210
 
211
  ---
212
 
213
+ ## 📁 Project Structure
214
 
215
+ ```
216
  pepe-meme-generator/
217
+ ├── src/ # Source code
218
+ │ ├── app.py # Main Streamlit application
219
+ │ ├── model/ # Model management
220
+ │ │ ├── __init__.py
221
+ │ │ ├── config.py # Model configurations
222
+ └── generator.py # Image generation logic
223
+ └── utils/ # Utility functions
224
+ ├── __init__.py
225
+ │ └── image_processor.py # Image processing utilities
226
+ ├── docs/ # Documentation
227
+ │ └──TRAINING.md # Model training guide
228
+ ├── models/ # Downloaded models (gitignored)
229
+ ├── outputs/ # Generated images (gitignored)
230
+ ├── scripts/ # Utility scripts
231
+ ├── tests/ # Test files
232
+ ├── diffusion_model_finetuning.ipynb # Training notebook
233
+ ├── requirements.txt # Python dependencies
234
+ ├── .gitignore # Git ignore rules
235
+ ├── .dockerignore # Docker ignore rules
236
+ ├── Dockerfile # Docker configuration
237
+ ├── LICENSE # MIT License
238
+ └── README.md # This file
239
+ ```
240
 
241
  ---
242
 
243
+ ## 🎓 Training Your Own Model
244
+
245
+ Want to fine-tune your own Pepe model or create a different character?
246
+
247
+ ### Quick Training Overview
248
+
249
+ ```bash
250
+ # 1. Prepare your dataset (images + captions)
251
+ # 2. Run the training script
252
+ accelerator launch train_text_to_image_lora.py \
253
+ --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
254
+ --train_data_dir="./your-data" \
255
+ --resolution=512 \
256
+ --train_batch_size=1 \
257
+ --gradient_accumulation_steps=4 \
258
+ --max_train_steps=2000 \
259
+ --learning_rate=1e-4 \
260
+ --lr_scheduler="cosine" \
261
+ --output_dir="./output" \
262
+ --rank=16
263
+ ```
264
+
265
+ ### Complete Training Guide
266
+
267
+ See **[docs/TRAINING.md] for:
268
+ - Dataset preparation
269
+ - Training configuration
270
+ - Hyperparameter tuning
271
+ - Validation and testing
272
+ - Model upload to Hugging Face
273
+
274
+ Or check out the **[diffusion_model_finetuning.ipynb](./diffusion_model_finetuning.ipynb)** notebook!
275
+
276
+ ---
277
 
278
+ ## 🛠️ Technology Stack
279
 
280
+ ### Core Technologies
281
+ - **[PyTorch](https://pytorch.org/)** - Deep learning framework
282
+ - **[Diffusers](https://github.com/huggingface/diffusers)** - Diffusion models library
283
+ - **[Transformers](https://github.com/huggingface/transformers)** - NLP models
284
+ - **[PEFT](https://github.com/huggingface/peft)** - Parameter-efficient fine-tuning (LoRA)
285
+ - **[Streamlit](https://streamlit.io/)** - Web UI framework
286
 
287
+ ### AI/ML Components
288
+ - **Stable Diffusion 1.5** - Base diffusion model
289
+ - **LoRA** - Low-Rank Adaptation for efficient fine-tuning
290
+ - **LCM** - Latent Consistency Model for fast inference
291
+ - **DPM Solver** - Fast diffusion sampling
292
 
293
+ ### Image Processing
294
+ - **Pillow (PIL)** - Image manipulation
295
+ - **NumPy** - Numerical operations
296
+ - **SciPy** - Scientific computing
297
 
298
  ---
299
 
300
+ ## 🤝 Contributing
301
+
302
+ Contributions are welcome! Here's how you can help:
303
+
304
+ ### Ways to Contribute
305
+ - 🐛 Report bugs
306
+ - 💡 Suggest new features
307
+ - 📝 Improve documentation
308
+ - 🎨 Add new style presets
309
+ - ⚡ Optimize performance
310
+ - 🧪 Add tests
311
+
312
+ ### Development Setup
313
+
314
+ ```bash
315
+ # Clone and setup
316
+ git clone https://github.com/YOUR_USERNAME/pepe-meme-generator.git
317
+ cd pepe-meme-generator
318
+ python -m venv venv
319
+ source venv/bin/activate
320
+ pip install -r requirements.txt
321
 
322
+ # Make your changes
323
+ # Test locally
324
+ streamlit run src/app.py
325
 
326
+ # Submit pull request
 
 
 
327
 
328
  ---
 
329
 
330
+ ## 🐛 Troubleshooting
331
+
332
+ ### Common Issues
333
+
334
+ **Issue**: Out of memory error
335
+ **Solution**: Reduce resolution to 512x512, use CPU mode, or enable memory optimizations
336
 
337
+ **Issue**: Slow generation on CPU
338
+ **Solution**: Use "Pepe + LCM (FAST)" model with 6 steps
339
+
340
+ **Issue**: Model not loading
341
+ **Solution**: Clear Streamlit cache with "Clear Cache & Reload" button
342
+
343
+ **Issue**: Import errors
344
+ **Solution**: Reinstall dependencies: `pip install -r requirements.txt --force-reinstall`
345
+
346
+
347
+ ---
348
 
349
  ## 📜 License
350
 
351
+ This project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details.
352
+
353
+ ### Model Licenses
354
+ - **Stable Diffusion 1.5**: CreativeML Open RAIL-M License
355
+ - **Pepe LoRA**: MIT License
356
+ - **Training Dataset**: Check [iresidentevil/pepe_the_frog](https://huggingface.co/datasets/iresidentevil/pepe_the_frog)
357
+
358
+ ---
359
+
360
+ ## 🙏 Acknowledgments
361
+
362
+ ### Special Thanks
363
+ - **[WorldQuant University](https://www.wqu.edu/ai-lab-computer-vision)** - AI/ML education and resources
364
+ - **[Hugging Face](https://huggingface.co/)** - Model hosting and diffusers library
365
+ - **[Stability AI](https://stability.ai/)** - Stable Diffusion model
366
+ - **[Microsoft](https://github.com/microsoft/LoRA)** - LoRA technique
367
+ - **[iresidentevil](https://huggingface.co/iresidentevil)** - Pepe dataset
368
+
369
+
370
+ ## 📞 Contact & Support
371
+
372
+ - **Issues**: [email protected]
373
+
374
+ ---
375
+
376
+ ## 🌟 Star History
377
+
378
+ If you find this project useful, please consider giving it a ⭐ star on GitHub!
379
+
380
+ ---
381
+
382
+ <div align="center">
383
+
384
+ **Made with ❤️ by MJaheen**
385
+
386
+ *Generate Pepes responsibly! 🐸*
387
+
388
+ </div>
diffusion_model_finetuning.ipynb ADDED
@@ -0,0 +1,482 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {
6
+ "id": "q4KpnNL4lY6q"
7
+ },
8
+ "source": [
9
+ "### Getting Ready"
10
+ ]
11
+ },
12
+ {
13
+ "cell_type": "code",
14
+ "source": [
15
+ "#!pip install datasets\n",
16
+ "#!pip uninstall -y diffusers\n",
17
+ "!git clone https://github.com/huggingface/diffusers.git\n",
18
+ "!pip install git+https://github.com/huggingface/diffusers.git\n",
19
+ "#!pip install --upgrade transformers accelerate safetensors torch torchvision"
20
+ ],
21
+ "metadata": {
22
+ "id": "yOvCmByVINi7",
23
+ "collapsed": true
24
+ },
25
+ "execution_count": null,
26
+ "outputs": []
27
+ },
28
+ {
29
+ "cell_type": "code",
30
+ "source": [
31
+ "from google.colab import drive\n",
32
+ "drive.mount('/content/drive')\n"
33
+ ],
34
+ "metadata": {
35
+ "id": "I4vsjgK2AbgI"
36
+ },
37
+ "execution_count": null,
38
+ "outputs": []
39
+ },
40
+ {
41
+ "cell_type": "code",
42
+ "source": [
43
+ "#Add trigger word to dataset and create the training paramters\n",
44
+ "\n",
45
+ "import os\n",
46
+ "import json\n",
47
+ "from datasets import load_dataset\n",
48
+ "from accelerate.utils import write_basic_config\n",
49
+ "from huggingface_hub import create_repo, upload_folder\n",
50
+ "\n",
51
+ "# --- 2. Configuration ---\n",
52
+ "# This is where you set all the important parameters for the training job.\n",
53
+ "\n",
54
+ "# Model and Dataset Parameters\n",
55
+ "base_model_id = \"runwayml/stable-diffusion-v1-5\"\n",
56
+ "dataset_name = \"iresidentevil/pepe_the_frog\" # The original dataset\n",
57
+ "text_column = \"prompt\"\n",
58
+ "image_column = \"image\"\n",
59
+ "trigger_word = \"pepe_style_frog\" # The trigger word we decided on\n",
60
+ "\n",
61
+ "# Training Parameters\n",
62
+ "output_dir = \"/content/drive/MyDrive/pepe-lora-sdxl-turbo_2\" # Where the trained LoRA will be saved\n",
63
+ "resolution = 512 # SDXL-Turbo works well at 512x512. Higher resolutions need more VRAM.\n",
64
+ "learning_rate = 1e-4\n",
65
+ "train_batch_size = 1 # Keep this at 1 for a small dataset to see each image.\n",
66
+ "gradient_accumulation_steps = 4\n",
67
+ "max_train_steps = 500 # A good starting point for a small dataset. Adjust as needed.\n",
68
+ "checkpointing_steps = 100 # Save a checkpoint every 100 steps.\n",
69
+ "\n",
70
+ "# LoRA Specific Parameters\n",
71
+ "lora_rank = 16 # Rank (dimension) of the LoRA. 16 is a good balance.\n",
72
+ "\n",
73
+ "# Hugging Face Hub Parameters\n",
74
+ "hf_hub_repo_id = \"your-username/pepe-lora-sdxl-turbo\" # Change to your Hub username and desired repo name\n",
75
+ "push_to_hub = True # Set to True to automatically upload your LoRA to the Hub\n",
76
+ "\n",
77
+ "\n",
78
+ "# --- 3. Prepare Dataset in \"Image Folder\" format ---\n",
79
+ "# This section now creates a local folder with images and a metadata.jsonl file,\n",
80
+ "# which is the format expected by the training script.\n",
81
+ "\n",
82
+ "print(\"Loading original dataset...\")\n",
83
+ "dataset = load_dataset(dataset_name, split=\"train\")\n",
84
+ "\n",
85
+ "\n",
86
+ "image_folder_path = \"/content/drive/MyDrive/pepe-data\"\n",
87
+ "os.makedirs(image_folder_path, exist_ok=True)\n",
88
+ "print(f\"Created directory for prepared data: {image_folder_path}\")\n",
89
+ "\n",
90
+ "metadata_file_path = os.path.join(image_folder_path, \"metadata.jsonl\")\n",
91
+ "\n",
92
+ "with open(metadata_file_path, \"w\") as f:\n",
93
+ " for i, example in enumerate(dataset):\n",
94
+ " # Get image and caption\n",
95
+ " image = example[image_column]\n",
96
+ " caption = example[text_column]\n",
97
+ "\n",
98
+ " # Add the trigger word\n",
99
+ " full_caption = f\"{trigger_word} {caption}\"\n",
100
+ "\n",
101
+ " # Save the image\n",
102
+ " image_filename = f\"image_{i}.png\"\n",
103
+ " image.save(os.path.join(image_folder_path, image_filename))\n",
104
+ "\n",
105
+ " # Write the metadata entry\n",
106
+ " metadata_entry = {\n",
107
+ " \"file_name\": image_filename,\n",
108
+ " text_column: full_caption\n",
109
+ " }\n",
110
+ " f.write(json.dumps(metadata_entry) + \"\\n\")\n",
111
+ "\n",
112
+ "print(f\"Dataset prepared and saved in 'image folder' format at: {image_folder_path}\")\n",
113
+ "\n",
114
+ "\n",
115
+ "# --- 4. Set up the Training Command ---\n",
116
+ "# This command now points to our correctly formatted image folder.\n",
117
+ "write_basic_config()\n",
118
+ "\n",
119
+ "command = [\n",
120
+ " \"accelerate\", \"launch\",\n",
121
+ " \"train_text_to_image_lora.py\",\n",
122
+ " f\"--pretrained_model_name_or_path={base_model_id}\",\n",
123
+ " f\"--train_data_dir={image_folder_path}\",\n",
124
+ " f\"--caption_column={text_column}\",\n",
125
+ " f\"--image_column={image_column}\",\n",
126
+ " f\"--dataloader_num_workers=8\",\n",
127
+ " f\"--resolution={resolution}\", \"--center_crop\", \"--random_flip\",\n",
128
+ " f\"--train_batch_size={train_batch_size}\",\n",
129
+ " f\"--gradient_accumulation_steps={gradient_accumulation_steps}\",\n",
130
+ " f\"--max_train_steps={max_train_steps}\",\n",
131
+ " f\"--learning_rate={learning_rate}\",\n",
132
+ " \"--lr_scheduler=constant\",\n",
133
+ " \"--lr_warmup_steps=0\",\n",
134
+ " f\"--output_dir={output_dir}\",\n",
135
+ " f\"--rank={lora_rank}\",\n",
136
+ " f\"--validation_prompt='{trigger_word} a sad frog in a blue hoodie, cartoon style'\",\n",
137
+ " f\"--checkpointing_steps={checkpointing_steps}\",\n",
138
+ " \"--checkpoints_total_limit=3\",\n",
139
+ "]\n",
140
+ "\n",
141
+ "if push_to_hub:\n",
142
+ " command.extend([f\"--push_to_hub\", f\"--hub_model_id={hf_hub_repo_id}\"])\n",
143
+ "\n",
144
+ "training_command_str = \" \".join(command)\n",
145
+ "\n",
146
+ "\n",
147
+ "# --- 5. Execute the Training ---\n",
148
+ "print(\"\\n\" + \"=\"*80)\n",
149
+ "print(\" TRAINING COMMAND\")\n",
150
+ "print(\"=\"*80)\n",
151
+ "print(\"The following command will be executed in your terminal:\")\n",
152
+ "print(training_command_str)\n",
153
+ "print(\"\\n\" + \"=\"*80)\n",
154
+ "print(\"To start training, copy the command above and paste it into your terminal.\")\n",
155
+ "print(\"Make sure you are in the correct environment where the diffusers examples are located.\")\n",
156
+ "print(\"You may need to clone the diffusers repo first: git clone https://github.com/huggingface/diffusers.git\")\n",
157
+ "print(\"CORRECTED PATH: Then navigate to: cd diffusers/examples/text_to_image\")\n",
158
+ "print(\"=\"*80)\n",
159
+ "\n"
160
+ ],
161
+ "metadata": {
162
+ "id": "RPv7Gv5h--SO"
163
+ },
164
+ "execution_count": null,
165
+ "outputs": []
166
+ },
167
+ {
168
+ "cell_type": "code",
169
+ "execution_count": null,
170
+ "metadata": {
171
+ "id": "yGDgzchblY6s"
172
+ },
173
+ "outputs": [],
174
+ "source": [
175
+ "import os\n",
176
+ "import sys\n",
177
+ "import datasets\n",
178
+ "import diffusers\n",
179
+ "import huggingface_hub\n",
180
+ "import requests\n",
181
+ "import torch\n",
182
+ "from dotenv import load_dotenv\n",
183
+ "from huggingface_hub import HfApi\n",
184
+ "from IPython.display import display"
185
+ ]
186
+ },
187
+ {
188
+ "cell_type": "markdown",
189
+ "metadata": {
190
+ "id": "6hoZLPDalY6t"
191
+ },
192
+ "source": [
193
+ "We'll print out version number of the critical packages, to help with future reproducibility."
194
+ ]
195
+ },
196
+ {
197
+ "cell_type": "code",
198
+ "execution_count": null,
199
+ "metadata": {
200
+ "id": "CaRvn_celY6t"
201
+ },
202
+ "outputs": [],
203
+ "source": [
204
+ "print(\"Platform:\", sys.platform)\n",
205
+ "print(\"Python version:\", sys.version)\n",
206
+ "print(\"---\")\n",
207
+ "print(\"datasets version: \", datasets.__version__)\n",
208
+ "print(\"diffusers version: \", diffusers.__version__)\n",
209
+ "print(\"huggingface_hub version: \", huggingface_hub.__version__)\n",
210
+ "print(\"torch version:\", torch.__version__)"
211
+ ]
212
+ },
213
+ {
214
+ "cell_type": "markdown",
215
+ "metadata": {
216
+ "id": "VLBQ_2A0lY6u"
217
+ },
218
+ "source": [
219
+ "Let's check if a GPU is available. If not, this notebook will take a long time to run!"
220
+ ]
221
+ },
222
+ {
223
+ "cell_type": "code",
224
+ "execution_count": null,
225
+ "metadata": {
226
+ "id": "jWTKdjUDlY6u"
227
+ },
228
+ "outputs": [],
229
+ "source": [
230
+ "if torch.cuda.is_available():\n",
231
+ " device = \"cuda\"\n",
232
+ " dtype = torch.float16\n",
233
+ "else:\n",
234
+ " device = \"cpu\"\n",
235
+ " dtype = torch.float32\n",
236
+ "\n",
237
+ "print(f\"Using {device} device with {dtype} data type.\")"
238
+ ]
239
+ },
240
+ {
241
+ "cell_type": "markdown",
242
+ "metadata": {
243
+ "id": "RCI8s5uylY6u"
244
+ },
245
+ "source": [
246
+ "### Load Stable Diffusion"
247
+ ]
248
+ },
249
+ {
250
+ "cell_type": "code",
251
+ "execution_count": null,
252
+ "metadata": {
253
+ "id": "2RU4U5mulY6w"
254
+ },
255
+ "outputs": [],
256
+ "source": [
257
+ "\n",
258
+ "MODEL_NAME = \"runwayml/stable-diffusion-v1-5\"\n",
259
+ "\n",
260
+ "pipeline = diffusers.AutoPipelineForText2Image.from_pretrained(\n",
261
+ " MODEL_NAME, torch_dtype=dtype\n",
262
+ ")\n",
263
+ "pipeline.to(device)\n",
264
+ "\n",
265
+ "print(type(pipeline))"
266
+ ]
267
+ },
268
+ {
269
+ "cell_type": "markdown",
270
+ "metadata": {
271
+ "id": "BMvqxn99lY6w"
272
+ },
273
+ "source": [
274
+ "Test base Model"
275
+ ]
276
+ },
277
+ {
278
+ "cell_type": "code",
279
+ "execution_count": null,
280
+ "metadata": {
281
+ "id": "-kBJqj9xlY6w"
282
+ },
283
+ "outputs": [],
284
+ "source": [
285
+ "images = pipeline([\"pepe the frog rolling eyes\"]*1).images\n",
286
+ "\n",
287
+ "for im in images:\n",
288
+ " display(im)"
289
+ ]
290
+ },
291
+ {
292
+ "cell_type": "code",
293
+ "execution_count": null,
294
+ "metadata": {
295
+ "id": "HqZRLoajlY6x"
296
+ },
297
+ "outputs": [],
298
+ "source": [
299
+ "#DATASET_NAME = \"worldquant-university/maya-dataset-v1\"\n",
300
+ "DATASET_NAME= \"iresidentevil/pepe_the_frog\"\n",
301
+ "data_builder = datasets.load_dataset_builder(DATASET_NAME)\n",
302
+ "\n",
303
+ "print(data_builder.dataset_name)"
304
+ ]
305
+ },
306
+ {
307
+ "cell_type": "code",
308
+ "execution_count": null,
309
+ "metadata": {
310
+ "id": "4EeHRlBmlY6x"
311
+ },
312
+ "outputs": [],
313
+ "source": [
314
+ "print(data_builder.info.features)"
315
+ ]
316
+ },
317
+ {
318
+ "cell_type": "code",
319
+ "execution_count": null,
320
+ "metadata": {
321
+ "id": "rgXvHJJVlY6y"
322
+ },
323
+ "outputs": [],
324
+ "source": [
325
+ "print(data_builder.info.splits)"
326
+ ]
327
+ },
328
+ {
329
+ "cell_type": "code",
330
+ "execution_count": null,
331
+ "metadata": {
332
+ "id": "-L2YvGMnlY6y"
333
+ },
334
+ "outputs": [],
335
+ "source": [
336
+ "data = datasets.load_dataset(DATASET_NAME)\n",
337
+ "\n",
338
+ "print(data)"
339
+ ]
340
+ },
341
+ {
342
+ "cell_type": "code",
343
+ "execution_count": null,
344
+ "metadata": {
345
+ "id": "k2iL94ILlY6z"
346
+ },
347
+ "outputs": [],
348
+ "source": [
349
+ "data[\"train\"][\"image\"]"
350
+ ]
351
+ },
352
+ {
353
+ "cell_type": "code",
354
+ "execution_count": null,
355
+ "metadata": {
356
+ "id": "6vBJgSPnlY6z"
357
+ },
358
+ "outputs": [],
359
+ "source": [
360
+ "# The values are PIL images, so they will be displayed\n",
361
+ "# automatically by Jupyter.\n",
362
+ "data[\"train\"][\"image\"][3]"
363
+ ]
364
+ },
365
+ {
366
+ "cell_type": "code",
367
+ "execution_count": null,
368
+ "metadata": {
369
+ "id": "Kbj0aOW9lY6z"
370
+ },
371
+ "outputs": [],
372
+ "source": [
373
+ "# Use dictionary indexing to look up the text values.\n",
374
+ "data[\"train\"][\"prompt\"]"
375
+ ]
376
+ },
377
+ {
378
+ "cell_type": "markdown",
379
+ "metadata": {
380
+ "id": "Q0RrkjXVlY60"
381
+ },
382
+ "source": [
383
+ "### LoRA Fine-tuning"
384
+ ]
385
+ },
386
+ {
387
+ "cell_type": "code",
388
+ "execution_count": null,
389
+ "metadata": {
390
+ "id": "36Jc_ijlwD75"
391
+ },
392
+ "outputs": [],
393
+ "source": [
394
+ "%cd diffusers/examples/text_to_image\n",
395
+ "\n",
396
+ "!accelerate launch train_text_to_image_lora.py \\\n",
397
+ " --pretrained_model_name_or_path=\"runwayml/stable-diffusion-v1-5\" \\\n",
398
+ " --train_data_dir=image_folder_path \\\n",
399
+ " --caption_column=\"prompt\" \\\n",
400
+ " --image_column=\"image\" \\\n",
401
+ " --resolution=512 --center_crop --random_flip \\\n",
402
+ " --train_batch_size=1 \\\n",
403
+ " --gradient_accumulation_steps=4 \\\n",
404
+ " --max_train_steps=2000 \\\n",
405
+ " --learning_rate=1e-4 \\\n",
406
+ " --lr_scheduler=\"cosine\" \\\n",
407
+ " --lr_warmup_steps=0 \\\n",
408
+ " --output_dir=output_dir \\\n",
409
+ " --rank=16 \\\n",
410
+ " --validation_prompt=\"pepe_style_frog, a high-quality, detailed image of pepe the frog smiling and holding a cup of coffee at sunrise\" \\\n",
411
+ " --seed=42 \\\n",
412
+ " --mixed_precision=\"fp16\" \\\n",
413
+ " --checkpointing_steps=150"
414
+ ]
415
+ },
416
+ {
417
+ "cell_type": "markdown",
418
+ "metadata": {
419
+ "id": "VKOcWmJ9lY62"
420
+ },
421
+ "source": [
422
+ "### Load LoRA Weights"
423
+ ]
424
+ },
425
+ {
426
+ "cell_type": "code",
427
+ "execution_count": null,
428
+ "metadata": {
429
+ "id": "SBGjOCmTlY63"
430
+ },
431
+ "outputs": [],
432
+ "source": [
433
+ "pipeline.load_lora_weights(\n",
434
+ " output_dir,\n",
435
+ "\n",
436
+ "\n",
437
+ " weight_name=\"pytorch_lora_weights.safetensors\",\n",
438
+ ")\n",
439
+ "pipeline.safety_checker = None"
440
+ ]
441
+ },
442
+ {
443
+ "cell_type": "code",
444
+ "execution_count": null,
445
+ "metadata": {
446
+ "id": "RYRckHGLlY63"
447
+ },
448
+ "outputs": [],
449
+ "source": [
450
+ "images = pipeline([\"pepe_style_frog making fun of rabbit that racing a tortile\"]).images\n",
451
+ "\n",
452
+ "for im in images:\n",
453
+ " display(im)"
454
+ ]
455
+ }
456
+ ],
457
+ "metadata": {
458
+ "accelerator": "GPU",
459
+ "colab": {
460
+ "gpuType": "T4",
461
+ "provenance": []
462
+ },
463
+ "kernelspec": {
464
+ "display_name": "Python 3",
465
+ "name": "python3"
466
+ },
467
+ "language_info": {
468
+ "codemirror_mode": {
469
+ "name": "ipython",
470
+ "version": 3
471
+ },
472
+ "file_extension": ".py",
473
+ "mimetype": "text/x-python",
474
+ "name": "python",
475
+ "nbconvert_exporter": "python",
476
+ "pygments_lexer": "ipython3",
477
+ "version": "3.11.0"
478
+ }
479
+ },
480
+ "nbformat": 4,
481
+ "nbformat_minor": 0
482
+ }
docs/TRAINING.md ADDED
@@ -0,0 +1,343 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🎓 Model Training Guide
2
+
3
+ This guide covers how to fine-tune your own Stable Diffusion model using LoRA (Low-Rank Adaptation) for creating custom character models like our Pepe generator.
4
+
5
+ ---
6
+
7
+ ## 📖 Table of Contents
8
+
9
+ - [Overview](#overview)
10
+ - [Prerequisites](#prerequisites)
11
+ - [Dataset Preparation](#dataset-preparation)
12
+ - [Training Configuration](#training-configuration)
13
+ - [Running the Training](#running-the-training)
14
+ - [Model Upload](#model-upload)
15
+
16
+
17
+ ---
18
+
19
+ ## 🎯 Overview
20
+
21
+ ### What is LoRA?
22
+
23
+ **LoRA (Low-Rank Adaptation)** is a parameter-efficient fine-tuning technique that:
24
+ - ✅ Trains only a small fraction of parameters (~0.5% of full model)
25
+ - ✅ Requires significantly less VRAM (~10GB vs 40GB+)
26
+ - ✅ Maintains base model quality while adding custom styles
27
+ - ✅ Produces small, portable adapter files (~100MB vs 4GB+)
28
+ - ✅ Can be combined with other LoRAs
29
+
30
+ ### Our Training Setup
31
+
32
+ **Model**: Pepe the Frog LoRA
33
+ **Base**: Stable Diffusion v1.5
34
+ **Dataset**: [iresidentevil/pepe_the_frog](https://huggingface.co/datasets/iresidentevil/pepe_the_frog)
35
+ **Result**: [MJaheen/Pepe_The_Frog_model_v1_lora](https://huggingface.co/MJaheen/Pepe_The_Frog_model_v1_lora)
36
+ **Training Time**: ~2-3 hours on T4 GPU (Google Colab)
37
+
38
+ ---
39
+
40
+ ## 🛠️ Prerequisites
41
+
42
+ ### Hardware Requirements
43
+
44
+ **Minimum**:
45
+ - GPU: NVIDIA GPU with 10GB+ VRAM (e.g., RTX 3080, T4)
46
+ - RAM: 16GB system RAM
47
+ - Storage: 20GB free space
48
+
49
+ **Recommended**:
50
+ - GPU: NVIDIA A100, V100, or RTX 4090
51
+ - RAM: 32GB system RAM
52
+ - Storage: 50GB+ SSD
53
+
54
+ **Cloud Options**:
55
+ - Google Colab (Free T4 GPU)
56
+ - Kaggle Notebooks (Free GPU)
57
+ - Lambda Labs
58
+ - RunPod
59
+ - Vast.ai
60
+
61
+ ### Software Requirements
62
+
63
+ ```bash
64
+ # Core dependencies
65
+ pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
66
+ pip install diffusers==0.31.0
67
+ pip install transformers==4.45.1
68
+ pip install accelerate==0.34.2
69
+ pip install peft>=0.11.0
70
+ pip install safetensors==0.4.4
71
+ pip install datasets
72
+ pip install bitsandbytes # For 8-bit Adam optimizer (optional)
73
+ ```
74
+
75
+ ---
76
+
77
+ ## 📂 Dataset Preparation
78
+
79
+ ### Dataset Structure
80
+
81
+ Your dataset should follow this structure:
82
+
83
+ ```
84
+ dataset/
85
+ ├── image_1.png
86
+ ├── image_2.png
87
+ ├── image_3.png
88
+ └── metadata.jsonl # or metadata.csv
89
+ ```
90
+
91
+ ### Metadata Format
92
+
93
+ **Option 1: JSONL (Recommended)**
94
+
95
+ ```jsonl
96
+ {"file_name": "image_1.png", "prompt": "pepe_style_frog, happy pepe smiling"}
97
+ {"file_name": "image_2.png", "prompt": "pepe_style_frog, sad pepe crying"}
98
+ {"file_name": "image_3.png", "prompt": "pepe_style_frog, pepe drinking coffee"}
99
+ ```
100
+
101
+ **Option 2: CSV**
102
+
103
+ ```csv
104
+ file_name,prompt
105
+ image_1.png,"pepe_style_frog, happy pepe smiling"
106
+ image_2.png,"pepe_style_frog, sad pepe crying"
107
+ image_3.png,"pepe_style_frog, pepe drinking coffee"
108
+ ```
109
+
110
+ ### Dataset Best Practices
111
+
112
+ 1. **Image Quality**
113
+ - Resolution: 512x512 or higher
114
+ - Format: PNG or JPG
115
+ - Clear, well-lit images
116
+ - Varied poses and expressions
117
+
118
+ 2. **Caption Quality**
119
+ - Include trigger word (e.g., `pepe_style_frog`)
120
+ - Describe key features and actions
121
+ - Be consistent in naming conventions
122
+ - 5-15 words per caption optimal
123
+
124
+ 3. **Dataset Size**
125
+ - Minimum: 20-50 images
126
+ - Optimal: 100-500 images
127
+ - More images = better generalization
128
+
129
+ 4. **Diversity**
130
+ - Various angles and poses
131
+ - Different expressions
132
+ - Multiple backgrounds
133
+ - Different lighting conditions
134
+
135
+ ### Our Pepe Dataset
136
+
137
+ We used **[iresidentevil/pepe_the_frog](https://huggingface.co/datasets/iresidentevil/pepe_the_frog)** which contains:
138
+ - ~200 high-quality Pepe images
139
+ - Consistent 512x512 resolution
140
+ - Varied expressions and styles
141
+ - Pre-captioned with trigger word
142
+
143
+ ---
144
+
145
+ ## ⚙️ Training Configuration
146
+
147
+ ### Training Hyperparameters
148
+
149
+ Here's the exact configuration we used for the Pepe model:
150
+
151
+ ```bash
152
+ accelerate launch train_text_to_image_lora.py \
153
+ --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
154
+ --train_data_dir="/path/to/pepe-data" \
155
+ --caption_column="prompt" \
156
+ --image_column="image" \
157
+ --resolution=512 \
158
+ --center_crop \
159
+ --random_flip \
160
+ --train_batch_size=1 \
161
+ --gradient_accumulation_steps=4 \
162
+ --max_train_steps=2000 \
163
+ --learning_rate=1e-4 \
164
+ --lr_scheduler="cosine" \
165
+ --lr_warmup_steps=0 \
166
+ --output_dir="./output" \
167
+ --rank=16 \
168
+ --validation_prompt="pepe_style_frog, a high-quality, detailed image of pepe the frog smiling and holding a cup of coffee at sunrise" \
169
+ --validation_epochs=5 \
170
+ --seed=42 \
171
+ --mixed_precision="fp16" \
172
+ --checkpointing_steps=150
173
+ ```
174
+
175
+ ### Parameter Explanation
176
+
177
+ | Parameter | Value | Description |
178
+ |-----------|-------|-------------|
179
+ | `pretrained_model_name_or_path` | `runwayml/stable-diffusion-v1-5` | Base model to fine-tune |
180
+ | `train_data_dir` | `/path/to/data` | Path to your dataset |
181
+ | `resolution` | `512` | Image resolution (512x512) |
182
+ | `train_batch_size` | `1` | Batch size per GPU |
183
+ | `gradient_accumulation_steps` | `4` | Effective batch size = 1 * 4 = 4 |
184
+ | `max_train_steps` | `2000` | Total training steps |
185
+ | `learning_rate` | `1e-4` | Initial learning rate |
186
+ | `lr_scheduler` | `cosine` | Learning rate schedule |
187
+ | `rank` | `16` | LoRA rank (higher = more parameters) |
188
+ | `mixed_precision` | `fp16` | Use 16-bit precision for speed |
189
+ | `checkpointing_steps` | `150` | Save checkpoint every N steps |
190
+
191
+ ### Hyperparameter Tuning Tips
192
+
193
+ **Learning Rate**:
194
+ - Too high: Training unstable, poor quality
195
+ - Too low: Slow convergence, underfitting
196
+ - Recommended: `1e-4` to `1e-5`
197
+
198
+ **LoRA Rank**:
199
+ - Lower (4-8): Faster training, smaller files, less expressive
200
+ - Medium (16-32): Balanced (recommended)
201
+ - Higher (64-128): More expressive, larger files, risk of overfitting
202
+
203
+ **Training Steps**:
204
+ - Small dataset (20-50 images): 500-1000 steps
205
+ - Medium dataset (50-200 images): 1000-2000 steps
206
+ - Large dataset (200+ images): 2000-5000 steps
207
+
208
+ **Batch Size**:
209
+ - Depends on VRAM availability
210
+ - Effective batch size = `batch_size × gradient_accumulation_steps`
211
+ - Recommended effective batch size: 4-8
212
+
213
+ ---
214
+
215
+ ## 🚀 Running the Training
216
+
217
+ ### Option 1: Google Colab (Recommended for Beginners)
218
+
219
+ 1. **Open the Notebook**:
220
+ - Use our provided notebook: `diffusion_model_finetuning.ipynb`
221
+ - Or create new Colab notebook
222
+
223
+ 2. **Setup GPU**:
224
+ ```
225
+ Runtime → Change runtime type → GPU (T4)
226
+ ```
227
+
228
+ 3. **Mount Google Drive** (optional):
229
+ ```python
230
+ from google.colab import drive
231
+ drive.mount('/content/drive')
232
+ ```
233
+
234
+ 4. **Install Dependencies**:
235
+ ```python
236
+ !pip install -q diffusers transformers accelerate peft
237
+ ```
238
+
239
+ 5. **Upload Dataset**:
240
+ - Upload to Google Drive
241
+ - Or download from Hugging Face
242
+
243
+ 6. **Run Training**:
244
+ ```python
245
+ !accelerate launch train_text_to_image_lora.py \
246
+ --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
247
+ --train_data_dir="/content/drive/MyDrive/pepe-data" \
248
+ --max_train_steps=2000 \
249
+ --learning_rate=1e-4 \
250
+ --output_dir="./output"
251
+ ```
252
+
253
+ 7. **Monitor Progress**:
254
+ - Watch loss decrease
255
+ - Check validation images
256
+ - Save checkpoints to Drive
257
+
258
+
259
+ ### Generate test image
260
+ image = pipe("pepe_style_frog, wizard casting spells").images[0]
261
+ image.save("validation.png")
262
+ ```
263
+
264
+
265
+ ## 📤 Model Upload
266
+
267
+ ### Prepare for Upload
268
+
269
+ 1. **Test Locally**:
270
+ ```python
271
+ from diffusers import StableDiffusionPipeline
272
+
273
+ pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
274
+ pipe.load_lora_weights("./output")
275
+
276
+ # Test
277
+ image = pipe("pepe_style_frog, happy pepe").images[0]
278
+ image.save("test.png")
279
+ ```
280
+
281
+ 2. **Prepare Files**:
282
+ ```
283
+ output/
284
+ ├── pytorch_lora_weights.safetensors # Main file
285
+ ├── README.md # Model card
286
+ └── sample_images/ # Example outputs
287
+ ```
288
+
289
+ ### Upload to Hugging Face
290
+
291
+ 1. **Install Hub CLI**:
292
+ ```bash
293
+ pip install huggingface_hub
294
+ huggingface-cli login
295
+ ```
296
+
297
+ 2. **Create Model Card** (`README.md`):
298
+ ```markdown
299
+ ---
300
+ license: creativeml-openrail-m
301
+ base_model: runwayml/stable-diffusion-v1-5
302
+ tags:
303
+ - stable-diffusion
304
+ - lora
305
+ - text-to-image
306
+ ---
307
+
308
+ # Pepe LoRA Model
309
+
310
+ Fine-tuned LoRA for generating Pepe the Frog images.
311
+
312
+ ## Usage
313
+ ```python
314
+ from diffusers import StableDiffusionPipeline
315
+
316
+ pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
317
+ pipe.load_lora_weights("YOUR_USERNAME/your-model-name")
318
+
319
+ image = pipe("pepe_style_frog, happy pepe").images[0]
320
+ ```
321
+ ```
322
+
323
+ 3. **Upload**:
324
+ ```python
325
+ from huggingface_hub import HfApi
326
+
327
+ api = HfApi()
328
+ api.create_repo("YOUR_USERNAME/pepe-lora", repo_type="model")
329
+ api.upload_folder(
330
+ folder_path="./output",
331
+ repo_id="YOUR_USERNAME/pepe-lora",
332
+ repo_type="model"
333
+ )
334
+ ```
335
+
336
+
337
+ ### Common Issues
338
+
339
+ **Out of Memory**:
340
+ - Reduce `train_batch_size` to 1
341
+ - Enable `--gradient_checkpointing`
342
+ - Use `--mixed_precision="fp16"`
343
+ - Reduce image resolution
src/app.py CHANGED
@@ -1,4 +1,33 @@
1
- """Pepe the Frog Meme Generator - Main Application"""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
  import streamlit as st
4
  from PIL import Image
@@ -36,7 +65,16 @@ st.markdown("""
36
 
37
 
38
  def init_session_state():
39
- """Initialize session state"""
 
 
 
 
 
 
 
 
 
40
  if 'generated_images' not in st.session_state:
41
  st.session_state.generated_images = []
42
  if 'generation_count' not in st.session_state:
@@ -47,7 +85,28 @@ def init_session_state():
47
 
48
  @st.cache_resource
49
  def load_generator(model_name: str = "Pepe Fine-tuned (LoRA)"):
50
- """Load and cache the generator based on selected model"""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
  config = ModelConfig()
52
  model_config = config.AVAILABLE_MODELS[model_name]
53
 
@@ -71,7 +130,15 @@ def load_generator(model_name: str = "Pepe Fine-tuned (LoRA)"):
71
 
72
 
73
  def get_example_prompts():
74
- """Return example prompts"""
 
 
 
 
 
 
 
 
75
  return [
76
  "pepe the frog as a wizard casting spells",
77
  "pepe the frog coding on a laptop",
@@ -82,7 +149,29 @@ def get_example_prompts():
82
 
83
 
84
  def main():
85
- """Main application"""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
  init_session_state()
87
 
88
  # Sidebar (needs to be first to define selected_model)
@@ -199,7 +288,8 @@ def main():
199
  if st.session_state.generated_images:
200
  placeholder.image(
201
  st.session_state.generated_images[-1],
202
- width='stretch'
 
203
  )
204
  else:
205
  placeholder.info("Your meme will appear here...")
 
1
+ """Pepe the Frog Meme Generator - Main Streamlit Application.
2
+
3
+ This is the main entry point for the web application. It provides a user-friendly
4
+ interface for generating Pepe memes using AI-powered Stable Diffusion models.
5
+
6
+ The application features:
7
+ - Model selection (multiple LoRA variants, LCM support)
8
+ - Style presets and raw prompt mode
9
+ - Advanced generation settings (steps, guidance, seed)
10
+ - Text overlay capability for meme creation
11
+ - Gallery system for viewing generated images
12
+ - Download functionality
13
+ - Progress tracking during generation
14
+
15
+ Application Structure:
16
+ 1. Page configuration and styling
17
+ 2. Session state initialization
18
+ 3. Model loading and caching
19
+ 4. Sidebar UI (model selection, settings)
20
+ 5. Main content area (prompt input, generation)
21
+ 6. Results display and download
22
+ 7. Gallery view
23
+
24
+ Usage:
25
+ Run with: streamlit run src/app.py
26
+ Access at: http://localhost:8501
27
+
28
+ Author: MJaheen
29
+ License: MIT
30
+ """
31
 
32
  import streamlit as st
33
  from PIL import Image
 
65
 
66
 
67
  def init_session_state():
68
+ """
69
+ Initialize Streamlit session state variables.
70
+
71
+ This function sets up persistent state across app reruns:
72
+ - generated_images: List of all generated images in current session
73
+ - generation_count: Counter for tracking total generations
74
+ - current_model: Currently selected model name for cache invalidation
75
+
76
+ Session state persists across reruns but is reset when the page is refreshed.
77
+ """
78
  if 'generated_images' not in st.session_state:
79
  st.session_state.generated_images = []
80
  if 'generation_count' not in st.session_state:
 
85
 
86
  @st.cache_resource
87
  def load_generator(model_name: str = "Pepe Fine-tuned (LoRA)"):
88
+ """
89
+ Load and cache the Stable Diffusion generator.
90
+
91
+ This function loads a PepeGenerator instance configured with the selected
92
+ model. It's cached using @st.cache_resource to avoid reloading the model
93
+ on every interaction, which would be very slow.
94
+
95
+ The cache is automatically invalidated when:
96
+ - The model_name parameter changes
97
+ - The user manually clears cache
98
+
99
+ Args:
100
+ model_name: Name of the model from AVAILABLE_MODELS dict.
101
+ Examples: "Pepe Fine-tuned (LoRA)", "Pepe + LCM (FAST)"
102
+
103
+ Returns:
104
+ PepeGenerator: Configured generator instance with loaded model.
105
+
106
+ Note:
107
+ Model loading can take 30-60 seconds on first load as it downloads
108
+ weights from Hugging Face (~4GB for base model + LoRA).
109
+ """
110
  config = ModelConfig()
111
  model_config = config.AVAILABLE_MODELS[model_name]
112
 
 
130
 
131
 
132
  def get_example_prompts():
133
+ """
134
+ Return a list of example prompts for inspiration.
135
+
136
+ These prompts are designed to work well with the fine-tuned Pepe model
137
+ and demonstrate various styles, activities, and scenarios.
138
+
139
+ Returns:
140
+ list: List of example prompt strings with trigger word and descriptions.
141
+ """
142
  return [
143
  "pepe the frog as a wizard casting spells",
144
  "pepe the frog coding on a laptop",
 
149
 
150
 
151
  def main():
152
+ """
153
+ Main application function that builds and runs the Streamlit UI.
154
+
155
+ This function orchestrates the entire application flow:
156
+ 1. Initializes session state
157
+ 2. Loads configuration and sets up sidebar controls
158
+ 3. Handles model selection and switching
159
+ 4. Processes user input (prompts, settings)
160
+ 5. Generates images when requested
161
+ 6. Displays results with download options
162
+ 7. Shows gallery of previous generations
163
+
164
+ The UI is organized into:
165
+ - Sidebar: Model selection, style presets, advanced settings
166
+ - Main area: Prompt input, generation button, results
167
+ - Bottom: Gallery view (expandable)
168
+
169
+ Flow:
170
+ User selects model → Enters prompt → Adjusts settings →
171
+ Clicks generate → Shows progress → Displays result →
172
+ Offers download → Adds to gallery
173
+ """
174
+ # Initialize session state for persistent data across reruns
175
  init_session_state()
176
 
177
  # Sidebar (needs to be first to define selected_model)
 
288
  if st.session_state.generated_images:
289
  placeholder.image(
290
  st.session_state.generated_images[-1],
291
+ #width='stretch'
292
+ st.image(img, use_column_width=True)
293
  )
294
  else:
295
  placeholder.info("Your meme will appear here...")
src/model/config.py CHANGED
@@ -1,4 +1,10 @@
1
- """Configuration management for the meme generator"""
 
 
 
 
 
 
2
 
3
  from dataclasses import dataclass
4
  from typing import Optional
@@ -6,14 +12,45 @@ from typing import Optional
6
 
7
  @dataclass
8
  class ModelConfig:
9
- """Model configuration parameters"""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
  # Available models
12
  AVAILABLE_MODELS: dict = None
13
 
14
  def __post_init__(self):
 
 
 
 
 
 
 
15
  if self.AVAILABLE_MODELS is None:
16
  self.AVAILABLE_MODELS = {
 
17
  "Pepe Fine-tuned (LoRA)": {
18
  "base": "runwayml/stable-diffusion-v1-5",
19
  "lora": "MJaheen/Pepe_The_Frog_model_v1_lora",
@@ -94,7 +131,7 @@ class ModelConfig:
94
  # Performance
95
  ENABLE_ATTENTION_SLICING: bool = True
96
  ENABLE_VAE_SLICING: bool = True
97
- FORCE_CPU: bool = True # Set to True to force CPU, False to use GPU if available
98
 
99
  # Available styles
100
  AVAILABLE_STYLES: tuple = (
 
1
+ """Configuration management for the Pepe meme generator.
2
+
3
+ This module defines all configuration parameters for model selection,
4
+ generation settings, and application behavior. The ModelConfig dataclass
5
+ provides a centralized configuration system with sensible defaults.
6
+
7
+ """
8
 
9
  from dataclasses import dataclass
10
  from typing import Optional
 
12
 
13
  @dataclass
14
  class ModelConfig:
15
+ """
16
+ Central configuration for model and generation parameters.
17
+
18
+ This dataclass contains all settings for model selection, generation
19
+ parameters, and optimization flags. It supports multiple models including
20
+ fine-tuned LoRA variants and fast LCM models.
21
+
22
+ Attributes:
23
+ AVAILABLE_MODELS: Dictionary of available model configurations
24
+ SELECTED_MODEL: Currently selected model name
25
+ BASE_MODEL: HuggingFace ID of the base Stable Diffusion model
26
+ LORA_PATH: Path or HuggingFace ID of LoRA weights
27
+ USE_LORA: Whether to load and use LoRA weights
28
+ USE_LCM: Whether to use LCM (Latent Consistency Model) for fast inference
29
+ LCM_LORA_PATH: Path to LCM-LoRA weights
30
+ TRIGGER_WORD: Trigger word to activate fine-tuned style
31
+ DEFAULT_STEPS: Default number of diffusion steps
32
+ DEFAULT_GUIDANCE: Default guidance scale (CFG)
33
+ DEFAULT_WIDTH: Default output image width
34
+ DEFAULT_HEIGHT: Default output image height
35
+ DEFAULT_NEGATIVE_PROMPT: Default negative prompt for all generations
36
+ FORCE_CPU: Force CPU mode (disable GPU)
37
+ ENABLE_XFORMERS: Enable memory-efficient attention
38
+ """
39
 
40
  # Available models
41
  AVAILABLE_MODELS: dict = None
42
 
43
  def __post_init__(self):
44
+ """
45
+ Initialize AVAILABLE_MODELS dictionary if not already set.
46
+
47
+ This method is called automatically after __init__. It populates
48
+ the AVAILABLE_MODELS dictionary with all supported model configurations.
49
+ Each model can have different base models, LoRA weights, and optimization flags.
50
+ """
51
  if self.AVAILABLE_MODELS is None:
52
  self.AVAILABLE_MODELS = {
53
+ # Primary fine-tuned model - Best quality, trained on Pepe dataset
54
  "Pepe Fine-tuned (LoRA)": {
55
  "base": "runwayml/stable-diffusion-v1-5",
56
  "lora": "MJaheen/Pepe_The_Frog_model_v1_lora",
 
131
  # Performance
132
  ENABLE_ATTENTION_SLICING: bool = True
133
  ENABLE_VAE_SLICING: bool = True
134
+ FORCE_CPU: bool = False # Set to True to force CPU, False to use GPU if available
135
 
136
  # Available styles
137
  AVAILABLE_STYLES: tuple = (
src/model/generator.py CHANGED
@@ -1,4 +1,16 @@
1
- """Pepe Meme Generator - Core generation logic"""
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
  from typing import Optional, List, Callable
4
  import torch
@@ -14,10 +26,43 @@ logger = logging.getLogger(__name__)
14
 
15
 
16
  class PepeGenerator:
17
- """Main generator class for creating Pepe memes"""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  def __init__(self, config: Optional[ModelConfig] = None):
20
- """Initialize the generator"""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  self.config = config or ModelConfig()
22
  self.device = self._get_device(self.config.FORCE_CPU)
23
  self.pipe = self._load_model(
@@ -153,28 +198,38 @@ class PepeGenerator:
153
  def generate(
154
  self,
155
  prompt: str,
156
- style: str = "default",
157
  negative_prompt: Optional[str] = None,
158
- num_inference_steps: int = 50,
159
  guidance_scale: float = 7.5,
160
- seed: Optional[int] = None,
161
  width: int = 512,
162
  height: int = 512,
163
- callback: Optional[Callable[[int, int], None]] = None,
164
- raw_prompt: bool = False,
165
- ) -> Image.Image:
166
- """Generate a single Pepe meme image
 
 
 
 
 
167
 
168
  Args:
169
- callback: Optional callback function (current_step, total_steps)
170
- raw_prompt: If True, use prompt as-is without modifications
 
 
 
 
 
 
 
 
 
 
 
171
  """
172
-
173
- # Apply style preset or use raw prompt
174
- if raw_prompt:
175
- enhanced_prompt = prompt
176
- else:
177
- enhanced_prompt = self._apply_style_preset(prompt, style)
178
 
179
  # Set default negative prompt
180
  if negative_prompt is None:
@@ -189,11 +244,11 @@ class PepeGenerator:
189
  logger.debug(f"Full prompt: {enhanced_prompt}")
190
  logger.debug(f"Model config - Base: {self.config.BASE_MODEL}, LoRA: {self.config.USE_LORA}")
191
 
192
- # Create callback wrapper if provided (using new API)
193
  callback_on_step_end_fn = None
194
- if callback:
195
  def callback_on_step_end_fn(pipe, step, timestep, callback_kwargs):
196
- callback(step + 1, num_inference_steps)
197
  return callback_kwargs
198
 
199
  # Generate image (removed autocast for CPU compatibility)
 
1
+ """Pepe Meme Generator - Core generation logic.
2
+
3
+ This module contains the main PepeGenerator class which handles:
4
+ - Loading and caching Stable Diffusion models
5
+ - Managing LoRA and LCM-LoRA adapters
6
+ - Configuring schedulers and optimizations
7
+ - Generating images from text prompts
8
+ - Progress tracking during generation
9
+
10
+ The generator supports multiple models, automatic GPU/CPU detection,
11
+ memory optimizations, and both standard and fast (LCM) inference modes.
12
+
13
+ """
14
 
15
  from typing import Optional, List, Callable
16
  import torch
 
26
 
27
 
28
  class PepeGenerator:
29
+ """
30
+ Main generator class for creating Pepe meme images.
31
+
32
+ This class manages the entire image generation pipeline including:
33
+ - Model loading and caching (with Streamlit cache_resource)
34
+ - LoRA and LCM-LoRA adapter management
35
+ - Scheduler configuration (DPM Solver or LCM)
36
+ - Memory optimizations (attention slicing, VAE slicing, xformers)
37
+ - Device management (automatic CUDA/CPU detection)
38
+ - Progress tracking callbacks
39
+
40
+ The generator is designed to work efficiently on both GPU and CPU,
41
+ with automatic optimizations based on available hardware.
42
+
43
+ Attributes:
44
+ config: ModelConfig instance with generation settings
45
+ device: Torch device ('cuda' or 'cpu')
46
+ pipe: Cached StableDiffusionPipeline instance
47
+ """
48
 
49
  def __init__(self, config: Optional[ModelConfig] = None):
50
+ """
51
+ Initialize the Pepe generator with configuration.
52
+
53
+ Sets up the generator by determining the compute device (GPU/CPU),
54
+ loading the model pipeline, and caching it for reuse. The model
55
+ loading is cached using Streamlit's cache_resource decorator to avoid
56
+ reloading on every interaction.
57
+
58
+ Args:
59
+ config: ModelConfig instance. If None, uses default configuration.
60
+
61
+ Example:
62
+ >>> config = ModelConfig()
63
+ >>> config.USE_LCM = True # Enable fast generation
64
+ >>> generator = PepeGenerator(config)
65
+ """
66
  self.config = config or ModelConfig()
67
  self.device = self._get_device(self.config.FORCE_CPU)
68
  self.pipe = self._load_model(
 
198
  def generate(
199
  self,
200
  prompt: str,
 
201
  negative_prompt: Optional[str] = None,
202
+ num_inference_steps: int = 25,
203
  guidance_scale: float = 7.5,
 
204
  width: int = 512,
205
  height: int = 512,
206
+ seed: Optional[int] = None,
207
+ progress_callback: Optional[Callable[[int, int], None]] = None
208
+ ) -> Image:
209
+ """
210
+ Generate a Pepe meme image from a text prompt.
211
+
212
+ This method runs the diffusion process to generate an image based on
213
+ the provided text prompt. It supports various parameters to control
214
+ the generation quality, style, and randomness.
215
 
216
  Args:
217
+ prompt: Text description of the desired image. For best results with
218
+ the fine-tuned model, include the trigger word 'pepe_style_frog'.
219
+ negative_prompt: Text describing what to avoid in the image.
220
+ If None, uses default from config.
221
+ num_inference_steps: Number of denoising steps (4-8 for LCM, 20-50 normal).
222
+ guidance_scale: CFG scale (1.0-2.0 for LCM, 5.0-15.0 normal).
223
+ width: Output image width in pixels (must be divisible by 8).
224
+ height: Output image height in pixels (must be divisible by 8).
225
+ seed: Random seed for reproducible generation.
226
+ progress_callback: Optional callback(current_step, total_steps).
227
+
228
+ Returns:
229
+ PIL Image object containing the generated image.
230
  """
231
+ # Use the prompt as-is (style handling is done in app.py before calling generate)
232
+ enhanced_prompt = prompt
 
 
 
 
233
 
234
  # Set default negative prompt
235
  if negative_prompt is None:
 
244
  logger.debug(f"Full prompt: {enhanced_prompt}")
245
  logger.debug(f"Model config - Base: {self.config.BASE_MODEL}, LoRA: {self.config.USE_LORA}")
246
 
247
+ # Create callback wrapper if provided (using new diffusers API)
248
  callback_on_step_end_fn = None
249
+ if progress_callback:
250
  def callback_on_step_end_fn(pipe, step, timestep, callback_kwargs):
251
+ progress_callback(step + 1, num_inference_steps)
252
  return callback_kwargs
253
 
254
  # Generate image (removed autocast for CPU compatibility)
src/utils/image_processor.py CHANGED
@@ -1,4 +1,16 @@
1
- """Image processing utilities"""
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
  from PIL import Image, ImageDraw, ImageFont, ImageEnhance
4
  from typing import Optional, Tuple
@@ -8,7 +20,18 @@ logger = logging.getLogger(__name__)
8
 
9
 
10
  class ImageProcessor:
11
- """Handles image post-processing"""
 
 
 
 
 
 
 
 
 
 
 
12
 
13
  @staticmethod
14
  def add_meme_text(
@@ -18,7 +41,27 @@ class ImageProcessor:
18
  font_size: int = 40,
19
  font_path: Optional[str] = None,
20
  ) -> Image.Image:
21
- """Add classic meme text to image"""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  img = image.copy()
24
  draw = ImageDraw.Draw(img)
@@ -159,7 +202,34 @@ class ImageProcessor:
159
  sharpness: float = 1.2,
160
  contrast: float = 1.1,
161
  ) -> Image.Image:
162
- """Apply enhancement filters"""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
163
 
164
  # Sharpen
165
  enhancer = ImageEnhance.Sharpness(image)
 
1
+ """Image Processing Utilities for Meme Creation.
2
+
3
+ This module provides utilities for post-processing generated images:
4
+ - Adding classic meme text with outlines
5
+ - Adding signatures/watermarks
6
+ - Enhancing image quality (sharpness, contrast)
7
+
8
+ All methods are static and can be used without instantiation.
9
+ The ImageProcessor class acts as a namespace for image manipulation functions.
10
+
11
+ Author: MJaheen
12
+ License: MIT
13
+ """
14
 
15
  from PIL import Image, ImageDraw, ImageFont, ImageEnhance
16
  from typing import Optional, Tuple
 
20
 
21
 
22
  class ImageProcessor:
23
+ """
24
+ Static utility class for image post-processing operations.
25
+
26
+ This class provides methods for enhancing generated images with meme text,
27
+ signatures, and quality improvements. All methods are static and work with
28
+ PIL Image objects.
29
+
30
+ Methods:
31
+ add_meme_text: Add top/bottom text in classic meme style
32
+ add_signature: Add watermark/signature to image
33
+ enhance_image: Apply sharpness and contrast enhancements
34
+ """
35
 
36
  @staticmethod
37
  def add_meme_text(
 
41
  font_size: int = 40,
42
  font_path: Optional[str] = None,
43
  ) -> Image.Image:
44
+ """
45
+ Add classic Impact-font meme text with white text and black outline.
46
+
47
+ Creates the traditional meme format with text at the top and/or bottom
48
+ of the image. Text is automatically converted to uppercase and rendered
49
+ with a thick black outline for readability on any background.
50
+
51
+ Args:
52
+ image: Input PIL Image to add text to
53
+ top_text: Text to display at top of image (default: "")
54
+ bottom_text: Text to display at bottom of image (default: "")
55
+ font_size: Size of the font in points (default: 40)
56
+ font_path: Optional path to custom font file (default: uses Impact)
57
+
58
+ Returns:
59
+ PIL Image with meme text overlay (copy of original, not modified in-place)
60
+
61
+ Note:
62
+ Falls back to default font if Impact font is not found.
63
+ Text is centered horizontally automatically.
64
+ """
65
 
66
  img = image.copy()
67
  draw = ImageDraw.Draw(img)
 
202
  sharpness: float = 1.2,
203
  contrast: float = 1.1,
204
  ) -> Image.Image:
205
+ """
206
+ Apply sharpness and contrast enhancements to improve image quality.
207
+
208
+ This method applies PIL's ImageEnhance filters to make the image
209
+ crisper and more vibrant. Useful for post-processing AI-generated
210
+ images which can sometimes appear slightly soft.
211
+
212
+ Args:
213
+ image: Input PIL Image to enhance
214
+ sharpness: Sharpness multiplier (default: 1.2)
215
+ - 0.0: Blurred
216
+ - 1.0: Original sharpness
217
+ - 2.0: Very sharp
218
+ Recommended range: 1.0-1.5
219
+ contrast: Contrast multiplier (default: 1.1)
220
+ - 0.0: Gray
221
+ - 1.0: Original contrast
222
+ - 2.0: High contrast
223
+ Recommended range: 1.0-1.3
224
+
225
+ Returns:
226
+ Enhanced PIL Image (modified in-place)
227
+
228
+ Example:
229
+ >>> image = Image.open("soft_image.png")
230
+ >>> enhanced = ImageProcessor.enhance_image(image, sharpness=1.3, contrast=1.2)
231
+ >>> enhanced.save("sharp_image.png")
232
+ """
233
 
234
  # Sharpen
235
  enhancer = ImageEnhance.Sharpness(image)