Spaces:
Running
Running
Add new features and fixes
Browse files- Fix some issues , add doucmention
- LICENSE +1 -1
- README.md +327 -56
- diffusion_model_finetuning.ipynb +482 -0
- docs/TRAINING.md +343 -0
- src/app.py +96 -6
- src/model/config.py +40 -3
- src/model/generator.py +76 -21
- src/utils/image_processor.py +74 -4
LICENSE
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
MIT License
|
| 2 |
|
| 3 |
-
Copyright (c) 2025 MJaheen
|
| 4 |
|
| 5 |
Permission is hereby granted, free of charge, to any person obtaining a copy
|
| 6 |
of this software and associated documentation files (the "Software"), to deal
|
|
|
|
| 1 |
MIT License
|
| 2 |
|
| 3 |
+
Copyright (c) 2025 MJaheen , [email protected]
|
| 4 |
|
| 5 |
Permission is hereby granted, free of charge, to any person obtaining a copy
|
| 6 |
of this software and associated documentation files (the "Software"), to deal
|
README.md
CHANGED
|
@@ -9,109 +9,380 @@ app_file: src/app.py
|
|
| 9 |
python_version: "3.11"
|
| 10 |
---
|
| 11 |
|
| 12 |
-
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
---
|
| 17 |
|
| 18 |
-
##
|
| 19 |
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
---
|
| 23 |
|
| 24 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
- Base SD 1.5 - Standard Stable Diffusion
|
| 29 |
-
- Dreamlike Photoreal 2.0 - Photorealistic style
|
| 30 |
-
- Openjourney v4 - Artistic Midjourney-style
|
| 31 |
- **Raw Prompt Mode**: Use exact prompts without automatic enhancements
|
| 32 |
-
-
|
| 33 |
-
-
|
| 34 |
-
- **
|
| 35 |
-
- **
|
| 36 |
-
-
|
| 37 |
-
|
| 38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
---
|
| 41 |
|
| 42 |
-
##
|
| 43 |
|
| 44 |
-
|
| 45 |
-
- "pepe coding on a laptop"
|
| 46 |
-
- "pepe drinking coffee"
|
| 47 |
-
- "smug pepe wearing sunglasses"
|
| 48 |
|
| 49 |
-
|
| 50 |
|
| 51 |
-
|
| 52 |
|
| 53 |
```bash
|
| 54 |
-
# Clone
|
| 55 |
git clone https://github.com/YOUR_USERNAME/pepe-meme-generator.git
|
| 56 |
cd pepe-meme-generator
|
| 57 |
|
| 58 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
| 59 |
pip install -r requirements.txt
|
| 60 |
|
| 61 |
-
# Run
|
| 62 |
streamlit run src/app.py
|
| 63 |
```
|
| 64 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
|
| 66 |
---
|
| 67 |
|
| 68 |
-
##
|
| 69 |
|
|
|
|
| 70 |
pepe-meme-generator/
|
| 71 |
-
├── src/
|
| 72 |
-
│ ├── app.py
|
| 73 |
-
│ ├── model/
|
| 74 |
-
│ │ ├──
|
| 75 |
-
│ │
|
| 76 |
-
│ └──
|
| 77 |
-
│
|
| 78 |
-
├──
|
| 79 |
-
|
| 80 |
-
├──
|
| 81 |
-
|
| 82 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
|
| 84 |
---
|
| 85 |
|
| 86 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 87 |
|
| 88 |
-
|
| 89 |
|
| 90 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
|
| 92 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 93 |
|
| 94 |
-
|
|
|
|
|
|
|
|
|
|
| 95 |
|
| 96 |
---
|
| 97 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 98 |
|
|
|
|
|
|
|
|
|
|
| 99 |
|
| 100 |
-
|
| 101 |
-
- Diffusion model architecture
|
| 102 |
-
- Transfer learning with LoRA
|
| 103 |
-
- Text-to-image synthesis
|
| 104 |
|
| 105 |
---
|
| 106 |
-
## 🎓 🙏 Acknowledgments
|
| 107 |
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
|
|
|
| 113 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 114 |
|
| 115 |
## 📜 License
|
| 116 |
|
| 117 |
-
MIT License
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
python_version: "3.11"
|
| 10 |
---
|
| 11 |
|
| 12 |
+
<div align="center">
|
| 13 |
|
| 14 |
+
# 🐸 Pepe the Frog AI Meme Generator
|
| 15 |
+
|
| 16 |
+
### Create custom Pepe memes using AI-powered Stable Diffusion with LoRA fine-tuning
|
| 17 |
+
|
| 18 |
+
[](https://www.python.org/downloads/)
|
| 19 |
+
[](https://streamlit.io)
|
| 20 |
+
[](https://opensource.org/licenses/MIT)
|
| 21 |
+
[](https://huggingface.co/MJaheen/Pepe_The_Frog_model_v1_lora)
|
| 22 |
+
|
| 23 |
+
[Demo](https://huggingface.co/spaces/MJaheen/Pepe-Meme-Generator) • [Documentation](./docs/) • [Training Guide](./docs/TRAINING.md) • [Report Bug](https://github.com/YOUR_USERNAME/pepe-meme-generator/issues)
|
| 24 |
+
|
| 25 |
+
</div>
|
| 26 |
|
| 27 |
---
|
| 28 |
|
| 29 |
+
## 📖 Table of Contents
|
| 30 |
|
| 31 |
+
- [Features](#-features)
|
| 32 |
+
- [Quick Start](#-quick-start)
|
| 33 |
+
- [Installation](#-installation)
|
| 34 |
+
- [Usage](#-usage)
|
| 35 |
+
- [Model Information](#-model-information)
|
| 36 |
+
- [Performance Optimization](#-performance-optimization)
|
| 37 |
+
- [Project Structure](#-project-structure)
|
| 38 |
+
- [Training](#-training-your-own-model)
|
| 39 |
+
- [Contributing](#-contributing)
|
| 40 |
+
- [License](#-license)
|
| 41 |
+
- [Acknowledgments](#-acknowledgments)
|
| 42 |
|
| 43 |
---
|
| 44 |
|
| 45 |
+
## ✨ Features
|
| 46 |
+
|
| 47 |
+
### 🎨 **Multiple AI Models**
|
| 48 |
+
- **Pepe Fine-tuned LoRA** - Custom trained on Pepe dataset (1600 steps)
|
| 49 |
+
- **Pepe + LCM (FAST)** - 8x faster generation with LCM technology
|
| 50 |
+
- **Tiny SD** - Lightweight model for faster CPU generation
|
| 51 |
+
- **Small SD** - Balanced speed and quality
|
| 52 |
+
- **Base SD 1.5** - Standard Stable Diffusion
|
| 53 |
+
- **Dreamlike Photoreal 2.0** - Photorealistic style
|
| 54 |
+
- **Openjourney v4** - Artistic Midjourney-inspired style
|
| 55 |
+
|
| 56 |
+
### ⚡ **Performance Features**
|
| 57 |
+
- **LCM Support**: Generate images in 6 steps (~30 seconds on CPU)
|
| 58 |
+
- **GPU Acceleration**: Automatic CUDA detection with xformers support
|
| 59 |
+
- **Memory Efficient**: Attention slicing and VAE slicing enabled
|
| 60 |
|
| 61 |
+
### 🎭 **Generation Features**
|
| 62 |
+
- **Style Presets**: Happy, sad, smug, angry, crying, and more
|
|
|
|
|
|
|
|
|
|
| 63 |
- **Raw Prompt Mode**: Use exact prompts without automatic enhancements
|
| 64 |
+
- **Text Overlays**: Add meme text with Impact font
|
| 65 |
+
- **Batch Generation**: Create multiple variations
|
| 66 |
+
- **Progress Tracking**: Real-time generation progress bar
|
| 67 |
+
- **Seed Control**: Reproducible generations with fixed seeds
|
| 68 |
+
- **Gallery System**: View and manage all generated memes
|
| 69 |
+
|
| 70 |
+
### 🎯 **User Experience**
|
| 71 |
+
- **Model Hot-Swapping**: Switch models without restart
|
| 72 |
+
- **Interactive UI**: Clean Streamlit interface
|
| 73 |
+
- **Example Prompts**: Built-in inspiration gallery
|
| 74 |
+
- **Download Support**: Save images with one click
|
| 75 |
+
- **Responsive Design**: Works on desktop and mobile
|
| 76 |
|
| 77 |
---
|
| 78 |
|
| 79 |
+
## 🚀 Quick Start
|
| 80 |
|
| 81 |
+
### Try Online (No Installation)
|
|
|
|
|
|
|
|
|
|
| 82 |
|
| 83 |
+
🌐 **[Open in Hugging Face Spaces](https://huggingface.co/spaces/MJaheen/Pepe-Meme-Generator)** - Run instantly in your browser!
|
| 84 |
|
| 85 |
+
### Local Installation
|
| 86 |
|
| 87 |
```bash
|
| 88 |
+
# 1. Clone the repository
|
| 89 |
git clone https://github.com/YOUR_USERNAME/pepe-meme-generator.git
|
| 90 |
cd pepe-meme-generator
|
| 91 |
|
| 92 |
+
# 2. Create virtual environment (recommended)
|
| 93 |
+
python -m venv venv
|
| 94 |
+
source venv/bin/activate # On Windows: venv\Scripts\activate
|
| 95 |
+
|
| 96 |
+
# 3. Install dependencies
|
| 97 |
pip install -r requirements.txt
|
| 98 |
|
| 99 |
+
# 4. Run the app
|
| 100 |
streamlit run src/app.py
|
| 101 |
```
|
| 102 |
|
| 103 |
+
The app will open in your browser at `http://localhost:8501`
|
| 104 |
+
|
| 105 |
+
---
|
| 106 |
+
|
| 107 |
+
## 📦 Installation
|
| 108 |
+
|
| 109 |
+
### System Requirements
|
| 110 |
+
|
| 111 |
+
- **Python**: 3.10 or higher
|
| 112 |
+
- **RAM**: 8GB minimum, 16GB recommended
|
| 113 |
+
- **GPU**: Optional (NVIDIA with CUDA for faster generation)
|
| 114 |
+
- **Storage**: ~5GB for models and dependencies
|
| 115 |
+
|
| 116 |
+
### Dependencies
|
| 117 |
+
|
| 118 |
+
```bash
|
| 119 |
+
# Core dependencies
|
| 120 |
+
pip install torch torchvision # PyTorch
|
| 121 |
+
pip install diffusers transformers accelerate # Diffusion models
|
| 122 |
+
pip install streamlit # Web interface
|
| 123 |
+
pip install pillow numpy scipy # Image processing
|
| 124 |
+
pip install peft safetensors # LoRA support
|
| 125 |
+
```
|
| 126 |
+
|
| 127 |
+
Or install everything at once:
|
| 128 |
+
|
| 129 |
+
```bash
|
| 130 |
+
pip install -r requirements.txt
|
| 131 |
+
```
|
| 132 |
+
|
| 133 |
+
### GPU Setup (Optional but Recommended)
|
| 134 |
+
|
| 135 |
+
For NVIDIA GPUs with CUDA:
|
| 136 |
+
|
| 137 |
+
```bash
|
| 138 |
+
# Install PyTorch with CUDA support
|
| 139 |
+
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
|
| 140 |
+
|
| 141 |
+
# Install xformers for memory-efficient attention
|
| 142 |
+
pip install xformers
|
| 143 |
+
```
|
| 144 |
+
|
| 145 |
+
---
|
| 146 |
+
|
| 147 |
+
## 🎮 Usage
|
| 148 |
+
|
| 149 |
+
### Basic Usage
|
| 150 |
+
|
| 151 |
+
1. **Select a Model**: Choose from the dropdown (try "Pepe + LCM (FAST)" for speed)
|
| 152 |
+
2. **Enter a Prompt**: e.g., "pepe the frog as a wizard casting spells"
|
| 153 |
+
3. **Adjust Settings**: Steps (6 for LCM, 25 for normal), guidance scale, etc.
|
| 154 |
+
4. **Generate**: Click "Generate Meme" and wait
|
| 155 |
+
5. **Download**: Save your creation!
|
| 156 |
+
|
| 157 |
+
### Example Prompts
|
| 158 |
+
|
| 159 |
+
```
|
| 160 |
+
pepe_style_frog, wizard casting magical spells, detailed
|
| 161 |
+
pepe_style_frog, programmer coding on laptop, cyberpunk style
|
| 162 |
+
pepe_style_frog, drinking coffee at sunrise, peaceful
|
| 163 |
+
pepe_style_frog, wearing sunglasses, smug expression
|
| 164 |
+
pepe_style_frog, crying with rain, emotional, dramatic lighting
|
| 165 |
+
```
|
| 166 |
+
|
| 167 |
+
### Advanced Features
|
| 168 |
+
|
| 169 |
+
#### **Using LCM for Fast Generation**
|
| 170 |
+
1. Select "Pepe + LCM (FAST)" model
|
| 171 |
+
2. Use 6 steps (optimal for LCM)
|
| 172 |
+
3. Set guidance scale to 1.5
|
| 173 |
+
4. Generate in ~30 seconds!
|
| 174 |
+
|
| 175 |
+
#### **Adding Text Overlays**
|
| 176 |
+
1. Expand "Add Text" section
|
| 177 |
+
2. Enter top and bottom text
|
| 178 |
+
3. Text automatically styled in Impact font
|
| 179 |
+
4. Signature "MJ" added to corner
|
| 180 |
+
|
| 181 |
+
#### **Reproducible Generations**
|
| 182 |
+
1. Enable "Fixed Seed" in Advanced Settings
|
| 183 |
+
2. Set a seed number (e.g., 42)
|
| 184 |
+
3. Same seed + prompt = same image
|
| 185 |
+
|
| 186 |
+
---
|
| 187 |
+
|
| 188 |
+
## 🤖 Model Information
|
| 189 |
+
|
| 190 |
+
### Fine-Tuned LoRA Model
|
| 191 |
+
|
| 192 |
+
**Model ID**: `MJaheen/Pepe_The_Frog_model_v1_lora`
|
| 193 |
+
|
| 194 |
+
**Training Details**:
|
| 195 |
+
- **Base Model**: Stable Diffusion v1.5
|
| 196 |
+
- **Method**: LoRA (Low-Rank Adaptation)
|
| 197 |
+
- **Dataset**: [iresidentevil/pepe_the_frog](https://huggingface.co/datasets/iresidentevil/pepe_the_frog)
|
| 198 |
+
- **Training Steps**: 2000
|
| 199 |
+
- **Resolution**: 512x512
|
| 200 |
+
- **Batch Size**: 1 (4 gradient accumulation)
|
| 201 |
+
- **Learning Rate**: 1e-4 (cosine schedule)
|
| 202 |
+
- **LoRA Rank**: 16
|
| 203 |
+
- **Precision**: Mixed FP16
|
| 204 |
+
- **Trigger Word**: `pepe_style_frog`
|
| 205 |
+
|
| 206 |
+
**Performance**:
|
| 207 |
+
- Quality: ⭐⭐⭐ (Good)
|
| 208 |
+
- Speed (CPU): ~4 minutes (25 steps)
|
| 209 |
+
- Speed (GPU): ~15 seconds (25 steps)
|
| 210 |
|
| 211 |
---
|
| 212 |
|
| 213 |
+
## 📁 Project Structure
|
| 214 |
|
| 215 |
+
```
|
| 216 |
pepe-meme-generator/
|
| 217 |
+
├── src/ # Source code
|
| 218 |
+
│ ├── app.py # Main Streamlit application
|
| 219 |
+
│ ├── model/ # Model management
|
| 220 |
+
│ │ ├── __init__.py
|
| 221 |
+
│ │ ├── config.py # Model configurations
|
| 222 |
+
│ │ └── generator.py # Image generation logic
|
| 223 |
+
│ └── utils/ # Utility functions
|
| 224 |
+
│ ├── __init__.py
|
| 225 |
+
│ └── image_processor.py # Image processing utilities
|
| 226 |
+
├── docs/ # Documentation
|
| 227 |
+
│ └──TRAINING.md # Model training guide
|
| 228 |
+
├── models/ # Downloaded models (gitignored)
|
| 229 |
+
├── outputs/ # Generated images (gitignored)
|
| 230 |
+
├── scripts/ # Utility scripts
|
| 231 |
+
├── tests/ # Test files
|
| 232 |
+
├── diffusion_model_finetuning.ipynb # Training notebook
|
| 233 |
+
├── requirements.txt # Python dependencies
|
| 234 |
+
├── .gitignore # Git ignore rules
|
| 235 |
+
├── .dockerignore # Docker ignore rules
|
| 236 |
+
├── Dockerfile # Docker configuration
|
| 237 |
+
├── LICENSE # MIT License
|
| 238 |
+
└── README.md # This file
|
| 239 |
+
```
|
| 240 |
|
| 241 |
---
|
| 242 |
|
| 243 |
+
## 🎓 Training Your Own Model
|
| 244 |
+
|
| 245 |
+
Want to fine-tune your own Pepe model or create a different character?
|
| 246 |
+
|
| 247 |
+
### Quick Training Overview
|
| 248 |
+
|
| 249 |
+
```bash
|
| 250 |
+
# 1. Prepare your dataset (images + captions)
|
| 251 |
+
# 2. Run the training script
|
| 252 |
+
accelerator launch train_text_to_image_lora.py \
|
| 253 |
+
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
|
| 254 |
+
--train_data_dir="./your-data" \
|
| 255 |
+
--resolution=512 \
|
| 256 |
+
--train_batch_size=1 \
|
| 257 |
+
--gradient_accumulation_steps=4 \
|
| 258 |
+
--max_train_steps=2000 \
|
| 259 |
+
--learning_rate=1e-4 \
|
| 260 |
+
--lr_scheduler="cosine" \
|
| 261 |
+
--output_dir="./output" \
|
| 262 |
+
--rank=16
|
| 263 |
+
```
|
| 264 |
+
|
| 265 |
+
### Complete Training Guide
|
| 266 |
+
|
| 267 |
+
See **[docs/TRAINING.md] for:
|
| 268 |
+
- Dataset preparation
|
| 269 |
+
- Training configuration
|
| 270 |
+
- Hyperparameter tuning
|
| 271 |
+
- Validation and testing
|
| 272 |
+
- Model upload to Hugging Face
|
| 273 |
+
|
| 274 |
+
Or check out the **[diffusion_model_finetuning.ipynb](./diffusion_model_finetuning.ipynb)** notebook!
|
| 275 |
+
|
| 276 |
+
---
|
| 277 |
|
| 278 |
+
## 🛠️ Technology Stack
|
| 279 |
|
| 280 |
+
### Core Technologies
|
| 281 |
+
- **[PyTorch](https://pytorch.org/)** - Deep learning framework
|
| 282 |
+
- **[Diffusers](https://github.com/huggingface/diffusers)** - Diffusion models library
|
| 283 |
+
- **[Transformers](https://github.com/huggingface/transformers)** - NLP models
|
| 284 |
+
- **[PEFT](https://github.com/huggingface/peft)** - Parameter-efficient fine-tuning (LoRA)
|
| 285 |
+
- **[Streamlit](https://streamlit.io/)** - Web UI framework
|
| 286 |
|
| 287 |
+
### AI/ML Components
|
| 288 |
+
- **Stable Diffusion 1.5** - Base diffusion model
|
| 289 |
+
- **LoRA** - Low-Rank Adaptation for efficient fine-tuning
|
| 290 |
+
- **LCM** - Latent Consistency Model for fast inference
|
| 291 |
+
- **DPM Solver** - Fast diffusion sampling
|
| 292 |
|
| 293 |
+
### Image Processing
|
| 294 |
+
- **Pillow (PIL)** - Image manipulation
|
| 295 |
+
- **NumPy** - Numerical operations
|
| 296 |
+
- **SciPy** - Scientific computing
|
| 297 |
|
| 298 |
---
|
| 299 |
|
| 300 |
+
## 🤝 Contributing
|
| 301 |
+
|
| 302 |
+
Contributions are welcome! Here's how you can help:
|
| 303 |
+
|
| 304 |
+
### Ways to Contribute
|
| 305 |
+
- 🐛 Report bugs
|
| 306 |
+
- 💡 Suggest new features
|
| 307 |
+
- 📝 Improve documentation
|
| 308 |
+
- 🎨 Add new style presets
|
| 309 |
+
- ⚡ Optimize performance
|
| 310 |
+
- 🧪 Add tests
|
| 311 |
+
|
| 312 |
+
### Development Setup
|
| 313 |
+
|
| 314 |
+
```bash
|
| 315 |
+
# Clone and setup
|
| 316 |
+
git clone https://github.com/YOUR_USERNAME/pepe-meme-generator.git
|
| 317 |
+
cd pepe-meme-generator
|
| 318 |
+
python -m venv venv
|
| 319 |
+
source venv/bin/activate
|
| 320 |
+
pip install -r requirements.txt
|
| 321 |
|
| 322 |
+
# Make your changes
|
| 323 |
+
# Test locally
|
| 324 |
+
streamlit run src/app.py
|
| 325 |
|
| 326 |
+
# Submit pull request
|
|
|
|
|
|
|
|
|
|
| 327 |
|
| 328 |
---
|
|
|
|
| 329 |
|
| 330 |
+
## 🐛 Troubleshooting
|
| 331 |
+
|
| 332 |
+
### Common Issues
|
| 333 |
+
|
| 334 |
+
**Issue**: Out of memory error
|
| 335 |
+
**Solution**: Reduce resolution to 512x512, use CPU mode, or enable memory optimizations
|
| 336 |
|
| 337 |
+
**Issue**: Slow generation on CPU
|
| 338 |
+
**Solution**: Use "Pepe + LCM (FAST)" model with 6 steps
|
| 339 |
+
|
| 340 |
+
**Issue**: Model not loading
|
| 341 |
+
**Solution**: Clear Streamlit cache with "Clear Cache & Reload" button
|
| 342 |
+
|
| 343 |
+
**Issue**: Import errors
|
| 344 |
+
**Solution**: Reinstall dependencies: `pip install -r requirements.txt --force-reinstall`
|
| 345 |
+
|
| 346 |
+
|
| 347 |
+
---
|
| 348 |
|
| 349 |
## 📜 License
|
| 350 |
|
| 351 |
+
This project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details.
|
| 352 |
+
|
| 353 |
+
### Model Licenses
|
| 354 |
+
- **Stable Diffusion 1.5**: CreativeML Open RAIL-M License
|
| 355 |
+
- **Pepe LoRA**: MIT License
|
| 356 |
+
- **Training Dataset**: Check [iresidentevil/pepe_the_frog](https://huggingface.co/datasets/iresidentevil/pepe_the_frog)
|
| 357 |
+
|
| 358 |
+
---
|
| 359 |
+
|
| 360 |
+
## 🙏 Acknowledgments
|
| 361 |
+
|
| 362 |
+
### Special Thanks
|
| 363 |
+
- **[WorldQuant University](https://www.wqu.edu/ai-lab-computer-vision)** - AI/ML education and resources
|
| 364 |
+
- **[Hugging Face](https://huggingface.co/)** - Model hosting and diffusers library
|
| 365 |
+
- **[Stability AI](https://stability.ai/)** - Stable Diffusion model
|
| 366 |
+
- **[Microsoft](https://github.com/microsoft/LoRA)** - LoRA technique
|
| 367 |
+
- **[iresidentevil](https://huggingface.co/iresidentevil)** - Pepe dataset
|
| 368 |
+
|
| 369 |
+
|
| 370 |
+
## 📞 Contact & Support
|
| 371 |
+
|
| 372 |
+
- **Issues**: [email protected]
|
| 373 |
+
|
| 374 |
+
---
|
| 375 |
+
|
| 376 |
+
## 🌟 Star History
|
| 377 |
+
|
| 378 |
+
If you find this project useful, please consider giving it a ⭐ star on GitHub!
|
| 379 |
+
|
| 380 |
+
---
|
| 381 |
+
|
| 382 |
+
<div align="center">
|
| 383 |
+
|
| 384 |
+
**Made with ❤️ by MJaheen**
|
| 385 |
+
|
| 386 |
+
*Generate Pepes responsibly! 🐸*
|
| 387 |
+
|
| 388 |
+
</div>
|
diffusion_model_finetuning.ipynb
ADDED
|
@@ -0,0 +1,482 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"cells": [
|
| 3 |
+
{
|
| 4 |
+
"cell_type": "markdown",
|
| 5 |
+
"metadata": {
|
| 6 |
+
"id": "q4KpnNL4lY6q"
|
| 7 |
+
},
|
| 8 |
+
"source": [
|
| 9 |
+
"### Getting Ready"
|
| 10 |
+
]
|
| 11 |
+
},
|
| 12 |
+
{
|
| 13 |
+
"cell_type": "code",
|
| 14 |
+
"source": [
|
| 15 |
+
"#!pip install datasets\n",
|
| 16 |
+
"#!pip uninstall -y diffusers\n",
|
| 17 |
+
"!git clone https://github.com/huggingface/diffusers.git\n",
|
| 18 |
+
"!pip install git+https://github.com/huggingface/diffusers.git\n",
|
| 19 |
+
"#!pip install --upgrade transformers accelerate safetensors torch torchvision"
|
| 20 |
+
],
|
| 21 |
+
"metadata": {
|
| 22 |
+
"id": "yOvCmByVINi7",
|
| 23 |
+
"collapsed": true
|
| 24 |
+
},
|
| 25 |
+
"execution_count": null,
|
| 26 |
+
"outputs": []
|
| 27 |
+
},
|
| 28 |
+
{
|
| 29 |
+
"cell_type": "code",
|
| 30 |
+
"source": [
|
| 31 |
+
"from google.colab import drive\n",
|
| 32 |
+
"drive.mount('/content/drive')\n"
|
| 33 |
+
],
|
| 34 |
+
"metadata": {
|
| 35 |
+
"id": "I4vsjgK2AbgI"
|
| 36 |
+
},
|
| 37 |
+
"execution_count": null,
|
| 38 |
+
"outputs": []
|
| 39 |
+
},
|
| 40 |
+
{
|
| 41 |
+
"cell_type": "code",
|
| 42 |
+
"source": [
|
| 43 |
+
"#Add trigger word to dataset and create the training paramters\n",
|
| 44 |
+
"\n",
|
| 45 |
+
"import os\n",
|
| 46 |
+
"import json\n",
|
| 47 |
+
"from datasets import load_dataset\n",
|
| 48 |
+
"from accelerate.utils import write_basic_config\n",
|
| 49 |
+
"from huggingface_hub import create_repo, upload_folder\n",
|
| 50 |
+
"\n",
|
| 51 |
+
"# --- 2. Configuration ---\n",
|
| 52 |
+
"# This is where you set all the important parameters for the training job.\n",
|
| 53 |
+
"\n",
|
| 54 |
+
"# Model and Dataset Parameters\n",
|
| 55 |
+
"base_model_id = \"runwayml/stable-diffusion-v1-5\"\n",
|
| 56 |
+
"dataset_name = \"iresidentevil/pepe_the_frog\" # The original dataset\n",
|
| 57 |
+
"text_column = \"prompt\"\n",
|
| 58 |
+
"image_column = \"image\"\n",
|
| 59 |
+
"trigger_word = \"pepe_style_frog\" # The trigger word we decided on\n",
|
| 60 |
+
"\n",
|
| 61 |
+
"# Training Parameters\n",
|
| 62 |
+
"output_dir = \"/content/drive/MyDrive/pepe-lora-sdxl-turbo_2\" # Where the trained LoRA will be saved\n",
|
| 63 |
+
"resolution = 512 # SDXL-Turbo works well at 512x512. Higher resolutions need more VRAM.\n",
|
| 64 |
+
"learning_rate = 1e-4\n",
|
| 65 |
+
"train_batch_size = 1 # Keep this at 1 for a small dataset to see each image.\n",
|
| 66 |
+
"gradient_accumulation_steps = 4\n",
|
| 67 |
+
"max_train_steps = 500 # A good starting point for a small dataset. Adjust as needed.\n",
|
| 68 |
+
"checkpointing_steps = 100 # Save a checkpoint every 100 steps.\n",
|
| 69 |
+
"\n",
|
| 70 |
+
"# LoRA Specific Parameters\n",
|
| 71 |
+
"lora_rank = 16 # Rank (dimension) of the LoRA. 16 is a good balance.\n",
|
| 72 |
+
"\n",
|
| 73 |
+
"# Hugging Face Hub Parameters\n",
|
| 74 |
+
"hf_hub_repo_id = \"your-username/pepe-lora-sdxl-turbo\" # Change to your Hub username and desired repo name\n",
|
| 75 |
+
"push_to_hub = True # Set to True to automatically upload your LoRA to the Hub\n",
|
| 76 |
+
"\n",
|
| 77 |
+
"\n",
|
| 78 |
+
"# --- 3. Prepare Dataset in \"Image Folder\" format ---\n",
|
| 79 |
+
"# This section now creates a local folder with images and a metadata.jsonl file,\n",
|
| 80 |
+
"# which is the format expected by the training script.\n",
|
| 81 |
+
"\n",
|
| 82 |
+
"print(\"Loading original dataset...\")\n",
|
| 83 |
+
"dataset = load_dataset(dataset_name, split=\"train\")\n",
|
| 84 |
+
"\n",
|
| 85 |
+
"\n",
|
| 86 |
+
"image_folder_path = \"/content/drive/MyDrive/pepe-data\"\n",
|
| 87 |
+
"os.makedirs(image_folder_path, exist_ok=True)\n",
|
| 88 |
+
"print(f\"Created directory for prepared data: {image_folder_path}\")\n",
|
| 89 |
+
"\n",
|
| 90 |
+
"metadata_file_path = os.path.join(image_folder_path, \"metadata.jsonl\")\n",
|
| 91 |
+
"\n",
|
| 92 |
+
"with open(metadata_file_path, \"w\") as f:\n",
|
| 93 |
+
" for i, example in enumerate(dataset):\n",
|
| 94 |
+
" # Get image and caption\n",
|
| 95 |
+
" image = example[image_column]\n",
|
| 96 |
+
" caption = example[text_column]\n",
|
| 97 |
+
"\n",
|
| 98 |
+
" # Add the trigger word\n",
|
| 99 |
+
" full_caption = f\"{trigger_word} {caption}\"\n",
|
| 100 |
+
"\n",
|
| 101 |
+
" # Save the image\n",
|
| 102 |
+
" image_filename = f\"image_{i}.png\"\n",
|
| 103 |
+
" image.save(os.path.join(image_folder_path, image_filename))\n",
|
| 104 |
+
"\n",
|
| 105 |
+
" # Write the metadata entry\n",
|
| 106 |
+
" metadata_entry = {\n",
|
| 107 |
+
" \"file_name\": image_filename,\n",
|
| 108 |
+
" text_column: full_caption\n",
|
| 109 |
+
" }\n",
|
| 110 |
+
" f.write(json.dumps(metadata_entry) + \"\\n\")\n",
|
| 111 |
+
"\n",
|
| 112 |
+
"print(f\"Dataset prepared and saved in 'image folder' format at: {image_folder_path}\")\n",
|
| 113 |
+
"\n",
|
| 114 |
+
"\n",
|
| 115 |
+
"# --- 4. Set up the Training Command ---\n",
|
| 116 |
+
"# This command now points to our correctly formatted image folder.\n",
|
| 117 |
+
"write_basic_config()\n",
|
| 118 |
+
"\n",
|
| 119 |
+
"command = [\n",
|
| 120 |
+
" \"accelerate\", \"launch\",\n",
|
| 121 |
+
" \"train_text_to_image_lora.py\",\n",
|
| 122 |
+
" f\"--pretrained_model_name_or_path={base_model_id}\",\n",
|
| 123 |
+
" f\"--train_data_dir={image_folder_path}\",\n",
|
| 124 |
+
" f\"--caption_column={text_column}\",\n",
|
| 125 |
+
" f\"--image_column={image_column}\",\n",
|
| 126 |
+
" f\"--dataloader_num_workers=8\",\n",
|
| 127 |
+
" f\"--resolution={resolution}\", \"--center_crop\", \"--random_flip\",\n",
|
| 128 |
+
" f\"--train_batch_size={train_batch_size}\",\n",
|
| 129 |
+
" f\"--gradient_accumulation_steps={gradient_accumulation_steps}\",\n",
|
| 130 |
+
" f\"--max_train_steps={max_train_steps}\",\n",
|
| 131 |
+
" f\"--learning_rate={learning_rate}\",\n",
|
| 132 |
+
" \"--lr_scheduler=constant\",\n",
|
| 133 |
+
" \"--lr_warmup_steps=0\",\n",
|
| 134 |
+
" f\"--output_dir={output_dir}\",\n",
|
| 135 |
+
" f\"--rank={lora_rank}\",\n",
|
| 136 |
+
" f\"--validation_prompt='{trigger_word} a sad frog in a blue hoodie, cartoon style'\",\n",
|
| 137 |
+
" f\"--checkpointing_steps={checkpointing_steps}\",\n",
|
| 138 |
+
" \"--checkpoints_total_limit=3\",\n",
|
| 139 |
+
"]\n",
|
| 140 |
+
"\n",
|
| 141 |
+
"if push_to_hub:\n",
|
| 142 |
+
" command.extend([f\"--push_to_hub\", f\"--hub_model_id={hf_hub_repo_id}\"])\n",
|
| 143 |
+
"\n",
|
| 144 |
+
"training_command_str = \" \".join(command)\n",
|
| 145 |
+
"\n",
|
| 146 |
+
"\n",
|
| 147 |
+
"# --- 5. Execute the Training ---\n",
|
| 148 |
+
"print(\"\\n\" + \"=\"*80)\n",
|
| 149 |
+
"print(\" TRAINING COMMAND\")\n",
|
| 150 |
+
"print(\"=\"*80)\n",
|
| 151 |
+
"print(\"The following command will be executed in your terminal:\")\n",
|
| 152 |
+
"print(training_command_str)\n",
|
| 153 |
+
"print(\"\\n\" + \"=\"*80)\n",
|
| 154 |
+
"print(\"To start training, copy the command above and paste it into your terminal.\")\n",
|
| 155 |
+
"print(\"Make sure you are in the correct environment where the diffusers examples are located.\")\n",
|
| 156 |
+
"print(\"You may need to clone the diffusers repo first: git clone https://github.com/huggingface/diffusers.git\")\n",
|
| 157 |
+
"print(\"CORRECTED PATH: Then navigate to: cd diffusers/examples/text_to_image\")\n",
|
| 158 |
+
"print(\"=\"*80)\n",
|
| 159 |
+
"\n"
|
| 160 |
+
],
|
| 161 |
+
"metadata": {
|
| 162 |
+
"id": "RPv7Gv5h--SO"
|
| 163 |
+
},
|
| 164 |
+
"execution_count": null,
|
| 165 |
+
"outputs": []
|
| 166 |
+
},
|
| 167 |
+
{
|
| 168 |
+
"cell_type": "code",
|
| 169 |
+
"execution_count": null,
|
| 170 |
+
"metadata": {
|
| 171 |
+
"id": "yGDgzchblY6s"
|
| 172 |
+
},
|
| 173 |
+
"outputs": [],
|
| 174 |
+
"source": [
|
| 175 |
+
"import os\n",
|
| 176 |
+
"import sys\n",
|
| 177 |
+
"import datasets\n",
|
| 178 |
+
"import diffusers\n",
|
| 179 |
+
"import huggingface_hub\n",
|
| 180 |
+
"import requests\n",
|
| 181 |
+
"import torch\n",
|
| 182 |
+
"from dotenv import load_dotenv\n",
|
| 183 |
+
"from huggingface_hub import HfApi\n",
|
| 184 |
+
"from IPython.display import display"
|
| 185 |
+
]
|
| 186 |
+
},
|
| 187 |
+
{
|
| 188 |
+
"cell_type": "markdown",
|
| 189 |
+
"metadata": {
|
| 190 |
+
"id": "6hoZLPDalY6t"
|
| 191 |
+
},
|
| 192 |
+
"source": [
|
| 193 |
+
"We'll print out version number of the critical packages, to help with future reproducibility."
|
| 194 |
+
]
|
| 195 |
+
},
|
| 196 |
+
{
|
| 197 |
+
"cell_type": "code",
|
| 198 |
+
"execution_count": null,
|
| 199 |
+
"metadata": {
|
| 200 |
+
"id": "CaRvn_celY6t"
|
| 201 |
+
},
|
| 202 |
+
"outputs": [],
|
| 203 |
+
"source": [
|
| 204 |
+
"print(\"Platform:\", sys.platform)\n",
|
| 205 |
+
"print(\"Python version:\", sys.version)\n",
|
| 206 |
+
"print(\"---\")\n",
|
| 207 |
+
"print(\"datasets version: \", datasets.__version__)\n",
|
| 208 |
+
"print(\"diffusers version: \", diffusers.__version__)\n",
|
| 209 |
+
"print(\"huggingface_hub version: \", huggingface_hub.__version__)\n",
|
| 210 |
+
"print(\"torch version:\", torch.__version__)"
|
| 211 |
+
]
|
| 212 |
+
},
|
| 213 |
+
{
|
| 214 |
+
"cell_type": "markdown",
|
| 215 |
+
"metadata": {
|
| 216 |
+
"id": "VLBQ_2A0lY6u"
|
| 217 |
+
},
|
| 218 |
+
"source": [
|
| 219 |
+
"Let's check if a GPU is available. If not, this notebook will take a long time to run!"
|
| 220 |
+
]
|
| 221 |
+
},
|
| 222 |
+
{
|
| 223 |
+
"cell_type": "code",
|
| 224 |
+
"execution_count": null,
|
| 225 |
+
"metadata": {
|
| 226 |
+
"id": "jWTKdjUDlY6u"
|
| 227 |
+
},
|
| 228 |
+
"outputs": [],
|
| 229 |
+
"source": [
|
| 230 |
+
"if torch.cuda.is_available():\n",
|
| 231 |
+
" device = \"cuda\"\n",
|
| 232 |
+
" dtype = torch.float16\n",
|
| 233 |
+
"else:\n",
|
| 234 |
+
" device = \"cpu\"\n",
|
| 235 |
+
" dtype = torch.float32\n",
|
| 236 |
+
"\n",
|
| 237 |
+
"print(f\"Using {device} device with {dtype} data type.\")"
|
| 238 |
+
]
|
| 239 |
+
},
|
| 240 |
+
{
|
| 241 |
+
"cell_type": "markdown",
|
| 242 |
+
"metadata": {
|
| 243 |
+
"id": "RCI8s5uylY6u"
|
| 244 |
+
},
|
| 245 |
+
"source": [
|
| 246 |
+
"### Load Stable Diffusion"
|
| 247 |
+
]
|
| 248 |
+
},
|
| 249 |
+
{
|
| 250 |
+
"cell_type": "code",
|
| 251 |
+
"execution_count": null,
|
| 252 |
+
"metadata": {
|
| 253 |
+
"id": "2RU4U5mulY6w"
|
| 254 |
+
},
|
| 255 |
+
"outputs": [],
|
| 256 |
+
"source": [
|
| 257 |
+
"\n",
|
| 258 |
+
"MODEL_NAME = \"runwayml/stable-diffusion-v1-5\"\n",
|
| 259 |
+
"\n",
|
| 260 |
+
"pipeline = diffusers.AutoPipelineForText2Image.from_pretrained(\n",
|
| 261 |
+
" MODEL_NAME, torch_dtype=dtype\n",
|
| 262 |
+
")\n",
|
| 263 |
+
"pipeline.to(device)\n",
|
| 264 |
+
"\n",
|
| 265 |
+
"print(type(pipeline))"
|
| 266 |
+
]
|
| 267 |
+
},
|
| 268 |
+
{
|
| 269 |
+
"cell_type": "markdown",
|
| 270 |
+
"metadata": {
|
| 271 |
+
"id": "BMvqxn99lY6w"
|
| 272 |
+
},
|
| 273 |
+
"source": [
|
| 274 |
+
"Test base Model"
|
| 275 |
+
]
|
| 276 |
+
},
|
| 277 |
+
{
|
| 278 |
+
"cell_type": "code",
|
| 279 |
+
"execution_count": null,
|
| 280 |
+
"metadata": {
|
| 281 |
+
"id": "-kBJqj9xlY6w"
|
| 282 |
+
},
|
| 283 |
+
"outputs": [],
|
| 284 |
+
"source": [
|
| 285 |
+
"images = pipeline([\"pepe the frog rolling eyes\"]*1).images\n",
|
| 286 |
+
"\n",
|
| 287 |
+
"for im in images:\n",
|
| 288 |
+
" display(im)"
|
| 289 |
+
]
|
| 290 |
+
},
|
| 291 |
+
{
|
| 292 |
+
"cell_type": "code",
|
| 293 |
+
"execution_count": null,
|
| 294 |
+
"metadata": {
|
| 295 |
+
"id": "HqZRLoajlY6x"
|
| 296 |
+
},
|
| 297 |
+
"outputs": [],
|
| 298 |
+
"source": [
|
| 299 |
+
"#DATASET_NAME = \"worldquant-university/maya-dataset-v1\"\n",
|
| 300 |
+
"DATASET_NAME= \"iresidentevil/pepe_the_frog\"\n",
|
| 301 |
+
"data_builder = datasets.load_dataset_builder(DATASET_NAME)\n",
|
| 302 |
+
"\n",
|
| 303 |
+
"print(data_builder.dataset_name)"
|
| 304 |
+
]
|
| 305 |
+
},
|
| 306 |
+
{
|
| 307 |
+
"cell_type": "code",
|
| 308 |
+
"execution_count": null,
|
| 309 |
+
"metadata": {
|
| 310 |
+
"id": "4EeHRlBmlY6x"
|
| 311 |
+
},
|
| 312 |
+
"outputs": [],
|
| 313 |
+
"source": [
|
| 314 |
+
"print(data_builder.info.features)"
|
| 315 |
+
]
|
| 316 |
+
},
|
| 317 |
+
{
|
| 318 |
+
"cell_type": "code",
|
| 319 |
+
"execution_count": null,
|
| 320 |
+
"metadata": {
|
| 321 |
+
"id": "rgXvHJJVlY6y"
|
| 322 |
+
},
|
| 323 |
+
"outputs": [],
|
| 324 |
+
"source": [
|
| 325 |
+
"print(data_builder.info.splits)"
|
| 326 |
+
]
|
| 327 |
+
},
|
| 328 |
+
{
|
| 329 |
+
"cell_type": "code",
|
| 330 |
+
"execution_count": null,
|
| 331 |
+
"metadata": {
|
| 332 |
+
"id": "-L2YvGMnlY6y"
|
| 333 |
+
},
|
| 334 |
+
"outputs": [],
|
| 335 |
+
"source": [
|
| 336 |
+
"data = datasets.load_dataset(DATASET_NAME)\n",
|
| 337 |
+
"\n",
|
| 338 |
+
"print(data)"
|
| 339 |
+
]
|
| 340 |
+
},
|
| 341 |
+
{
|
| 342 |
+
"cell_type": "code",
|
| 343 |
+
"execution_count": null,
|
| 344 |
+
"metadata": {
|
| 345 |
+
"id": "k2iL94ILlY6z"
|
| 346 |
+
},
|
| 347 |
+
"outputs": [],
|
| 348 |
+
"source": [
|
| 349 |
+
"data[\"train\"][\"image\"]"
|
| 350 |
+
]
|
| 351 |
+
},
|
| 352 |
+
{
|
| 353 |
+
"cell_type": "code",
|
| 354 |
+
"execution_count": null,
|
| 355 |
+
"metadata": {
|
| 356 |
+
"id": "6vBJgSPnlY6z"
|
| 357 |
+
},
|
| 358 |
+
"outputs": [],
|
| 359 |
+
"source": [
|
| 360 |
+
"# The values are PIL images, so they will be displayed\n",
|
| 361 |
+
"# automatically by Jupyter.\n",
|
| 362 |
+
"data[\"train\"][\"image\"][3]"
|
| 363 |
+
]
|
| 364 |
+
},
|
| 365 |
+
{
|
| 366 |
+
"cell_type": "code",
|
| 367 |
+
"execution_count": null,
|
| 368 |
+
"metadata": {
|
| 369 |
+
"id": "Kbj0aOW9lY6z"
|
| 370 |
+
},
|
| 371 |
+
"outputs": [],
|
| 372 |
+
"source": [
|
| 373 |
+
"# Use dictionary indexing to look up the text values.\n",
|
| 374 |
+
"data[\"train\"][\"prompt\"]"
|
| 375 |
+
]
|
| 376 |
+
},
|
| 377 |
+
{
|
| 378 |
+
"cell_type": "markdown",
|
| 379 |
+
"metadata": {
|
| 380 |
+
"id": "Q0RrkjXVlY60"
|
| 381 |
+
},
|
| 382 |
+
"source": [
|
| 383 |
+
"### LoRA Fine-tuning"
|
| 384 |
+
]
|
| 385 |
+
},
|
| 386 |
+
{
|
| 387 |
+
"cell_type": "code",
|
| 388 |
+
"execution_count": null,
|
| 389 |
+
"metadata": {
|
| 390 |
+
"id": "36Jc_ijlwD75"
|
| 391 |
+
},
|
| 392 |
+
"outputs": [],
|
| 393 |
+
"source": [
|
| 394 |
+
"%cd diffusers/examples/text_to_image\n",
|
| 395 |
+
"\n",
|
| 396 |
+
"!accelerate launch train_text_to_image_lora.py \\\n",
|
| 397 |
+
" --pretrained_model_name_or_path=\"runwayml/stable-diffusion-v1-5\" \\\n",
|
| 398 |
+
" --train_data_dir=image_folder_path \\\n",
|
| 399 |
+
" --caption_column=\"prompt\" \\\n",
|
| 400 |
+
" --image_column=\"image\" \\\n",
|
| 401 |
+
" --resolution=512 --center_crop --random_flip \\\n",
|
| 402 |
+
" --train_batch_size=1 \\\n",
|
| 403 |
+
" --gradient_accumulation_steps=4 \\\n",
|
| 404 |
+
" --max_train_steps=2000 \\\n",
|
| 405 |
+
" --learning_rate=1e-4 \\\n",
|
| 406 |
+
" --lr_scheduler=\"cosine\" \\\n",
|
| 407 |
+
" --lr_warmup_steps=0 \\\n",
|
| 408 |
+
" --output_dir=output_dir \\\n",
|
| 409 |
+
" --rank=16 \\\n",
|
| 410 |
+
" --validation_prompt=\"pepe_style_frog, a high-quality, detailed image of pepe the frog smiling and holding a cup of coffee at sunrise\" \\\n",
|
| 411 |
+
" --seed=42 \\\n",
|
| 412 |
+
" --mixed_precision=\"fp16\" \\\n",
|
| 413 |
+
" --checkpointing_steps=150"
|
| 414 |
+
]
|
| 415 |
+
},
|
| 416 |
+
{
|
| 417 |
+
"cell_type": "markdown",
|
| 418 |
+
"metadata": {
|
| 419 |
+
"id": "VKOcWmJ9lY62"
|
| 420 |
+
},
|
| 421 |
+
"source": [
|
| 422 |
+
"### Load LoRA Weights"
|
| 423 |
+
]
|
| 424 |
+
},
|
| 425 |
+
{
|
| 426 |
+
"cell_type": "code",
|
| 427 |
+
"execution_count": null,
|
| 428 |
+
"metadata": {
|
| 429 |
+
"id": "SBGjOCmTlY63"
|
| 430 |
+
},
|
| 431 |
+
"outputs": [],
|
| 432 |
+
"source": [
|
| 433 |
+
"pipeline.load_lora_weights(\n",
|
| 434 |
+
" output_dir,\n",
|
| 435 |
+
"\n",
|
| 436 |
+
"\n",
|
| 437 |
+
" weight_name=\"pytorch_lora_weights.safetensors\",\n",
|
| 438 |
+
")\n",
|
| 439 |
+
"pipeline.safety_checker = None"
|
| 440 |
+
]
|
| 441 |
+
},
|
| 442 |
+
{
|
| 443 |
+
"cell_type": "code",
|
| 444 |
+
"execution_count": null,
|
| 445 |
+
"metadata": {
|
| 446 |
+
"id": "RYRckHGLlY63"
|
| 447 |
+
},
|
| 448 |
+
"outputs": [],
|
| 449 |
+
"source": [
|
| 450 |
+
"images = pipeline([\"pepe_style_frog making fun of rabbit that racing a tortile\"]).images\n",
|
| 451 |
+
"\n",
|
| 452 |
+
"for im in images:\n",
|
| 453 |
+
" display(im)"
|
| 454 |
+
]
|
| 455 |
+
}
|
| 456 |
+
],
|
| 457 |
+
"metadata": {
|
| 458 |
+
"accelerator": "GPU",
|
| 459 |
+
"colab": {
|
| 460 |
+
"gpuType": "T4",
|
| 461 |
+
"provenance": []
|
| 462 |
+
},
|
| 463 |
+
"kernelspec": {
|
| 464 |
+
"display_name": "Python 3",
|
| 465 |
+
"name": "python3"
|
| 466 |
+
},
|
| 467 |
+
"language_info": {
|
| 468 |
+
"codemirror_mode": {
|
| 469 |
+
"name": "ipython",
|
| 470 |
+
"version": 3
|
| 471 |
+
},
|
| 472 |
+
"file_extension": ".py",
|
| 473 |
+
"mimetype": "text/x-python",
|
| 474 |
+
"name": "python",
|
| 475 |
+
"nbconvert_exporter": "python",
|
| 476 |
+
"pygments_lexer": "ipython3",
|
| 477 |
+
"version": "3.11.0"
|
| 478 |
+
}
|
| 479 |
+
},
|
| 480 |
+
"nbformat": 4,
|
| 481 |
+
"nbformat_minor": 0
|
| 482 |
+
}
|
docs/TRAINING.md
ADDED
|
@@ -0,0 +1,343 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 🎓 Model Training Guide
|
| 2 |
+
|
| 3 |
+
This guide covers how to fine-tune your own Stable Diffusion model using LoRA (Low-Rank Adaptation) for creating custom character models like our Pepe generator.
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## 📖 Table of Contents
|
| 8 |
+
|
| 9 |
+
- [Overview](#overview)
|
| 10 |
+
- [Prerequisites](#prerequisites)
|
| 11 |
+
- [Dataset Preparation](#dataset-preparation)
|
| 12 |
+
- [Training Configuration](#training-configuration)
|
| 13 |
+
- [Running the Training](#running-the-training)
|
| 14 |
+
- [Model Upload](#model-upload)
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
## 🎯 Overview
|
| 20 |
+
|
| 21 |
+
### What is LoRA?
|
| 22 |
+
|
| 23 |
+
**LoRA (Low-Rank Adaptation)** is a parameter-efficient fine-tuning technique that:
|
| 24 |
+
- ✅ Trains only a small fraction of parameters (~0.5% of full model)
|
| 25 |
+
- ✅ Requires significantly less VRAM (~10GB vs 40GB+)
|
| 26 |
+
- ✅ Maintains base model quality while adding custom styles
|
| 27 |
+
- ✅ Produces small, portable adapter files (~100MB vs 4GB+)
|
| 28 |
+
- ✅ Can be combined with other LoRAs
|
| 29 |
+
|
| 30 |
+
### Our Training Setup
|
| 31 |
+
|
| 32 |
+
**Model**: Pepe the Frog LoRA
|
| 33 |
+
**Base**: Stable Diffusion v1.5
|
| 34 |
+
**Dataset**: [iresidentevil/pepe_the_frog](https://huggingface.co/datasets/iresidentevil/pepe_the_frog)
|
| 35 |
+
**Result**: [MJaheen/Pepe_The_Frog_model_v1_lora](https://huggingface.co/MJaheen/Pepe_The_Frog_model_v1_lora)
|
| 36 |
+
**Training Time**: ~2-3 hours on T4 GPU (Google Colab)
|
| 37 |
+
|
| 38 |
+
---
|
| 39 |
+
|
| 40 |
+
## 🛠️ Prerequisites
|
| 41 |
+
|
| 42 |
+
### Hardware Requirements
|
| 43 |
+
|
| 44 |
+
**Minimum**:
|
| 45 |
+
- GPU: NVIDIA GPU with 10GB+ VRAM (e.g., RTX 3080, T4)
|
| 46 |
+
- RAM: 16GB system RAM
|
| 47 |
+
- Storage: 20GB free space
|
| 48 |
+
|
| 49 |
+
**Recommended**:
|
| 50 |
+
- GPU: NVIDIA A100, V100, or RTX 4090
|
| 51 |
+
- RAM: 32GB system RAM
|
| 52 |
+
- Storage: 50GB+ SSD
|
| 53 |
+
|
| 54 |
+
**Cloud Options**:
|
| 55 |
+
- Google Colab (Free T4 GPU)
|
| 56 |
+
- Kaggle Notebooks (Free GPU)
|
| 57 |
+
- Lambda Labs
|
| 58 |
+
- RunPod
|
| 59 |
+
- Vast.ai
|
| 60 |
+
|
| 61 |
+
### Software Requirements
|
| 62 |
+
|
| 63 |
+
```bash
|
| 64 |
+
# Core dependencies
|
| 65 |
+
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
|
| 66 |
+
pip install diffusers==0.31.0
|
| 67 |
+
pip install transformers==4.45.1
|
| 68 |
+
pip install accelerate==0.34.2
|
| 69 |
+
pip install peft>=0.11.0
|
| 70 |
+
pip install safetensors==0.4.4
|
| 71 |
+
pip install datasets
|
| 72 |
+
pip install bitsandbytes # For 8-bit Adam optimizer (optional)
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
---
|
| 76 |
+
|
| 77 |
+
## 📂 Dataset Preparation
|
| 78 |
+
|
| 79 |
+
### Dataset Structure
|
| 80 |
+
|
| 81 |
+
Your dataset should follow this structure:
|
| 82 |
+
|
| 83 |
+
```
|
| 84 |
+
dataset/
|
| 85 |
+
├── image_1.png
|
| 86 |
+
├── image_2.png
|
| 87 |
+
├── image_3.png
|
| 88 |
+
└── metadata.jsonl # or metadata.csv
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
### Metadata Format
|
| 92 |
+
|
| 93 |
+
**Option 1: JSONL (Recommended)**
|
| 94 |
+
|
| 95 |
+
```jsonl
|
| 96 |
+
{"file_name": "image_1.png", "prompt": "pepe_style_frog, happy pepe smiling"}
|
| 97 |
+
{"file_name": "image_2.png", "prompt": "pepe_style_frog, sad pepe crying"}
|
| 98 |
+
{"file_name": "image_3.png", "prompt": "pepe_style_frog, pepe drinking coffee"}
|
| 99 |
+
```
|
| 100 |
+
|
| 101 |
+
**Option 2: CSV**
|
| 102 |
+
|
| 103 |
+
```csv
|
| 104 |
+
file_name,prompt
|
| 105 |
+
image_1.png,"pepe_style_frog, happy pepe smiling"
|
| 106 |
+
image_2.png,"pepe_style_frog, sad pepe crying"
|
| 107 |
+
image_3.png,"pepe_style_frog, pepe drinking coffee"
|
| 108 |
+
```
|
| 109 |
+
|
| 110 |
+
### Dataset Best Practices
|
| 111 |
+
|
| 112 |
+
1. **Image Quality**
|
| 113 |
+
- Resolution: 512x512 or higher
|
| 114 |
+
- Format: PNG or JPG
|
| 115 |
+
- Clear, well-lit images
|
| 116 |
+
- Varied poses and expressions
|
| 117 |
+
|
| 118 |
+
2. **Caption Quality**
|
| 119 |
+
- Include trigger word (e.g., `pepe_style_frog`)
|
| 120 |
+
- Describe key features and actions
|
| 121 |
+
- Be consistent in naming conventions
|
| 122 |
+
- 5-15 words per caption optimal
|
| 123 |
+
|
| 124 |
+
3. **Dataset Size**
|
| 125 |
+
- Minimum: 20-50 images
|
| 126 |
+
- Optimal: 100-500 images
|
| 127 |
+
- More images = better generalization
|
| 128 |
+
|
| 129 |
+
4. **Diversity**
|
| 130 |
+
- Various angles and poses
|
| 131 |
+
- Different expressions
|
| 132 |
+
- Multiple backgrounds
|
| 133 |
+
- Different lighting conditions
|
| 134 |
+
|
| 135 |
+
### Our Pepe Dataset
|
| 136 |
+
|
| 137 |
+
We used **[iresidentevil/pepe_the_frog](https://huggingface.co/datasets/iresidentevil/pepe_the_frog)** which contains:
|
| 138 |
+
- ~200 high-quality Pepe images
|
| 139 |
+
- Consistent 512x512 resolution
|
| 140 |
+
- Varied expressions and styles
|
| 141 |
+
- Pre-captioned with trigger word
|
| 142 |
+
|
| 143 |
+
---
|
| 144 |
+
|
| 145 |
+
## ⚙️ Training Configuration
|
| 146 |
+
|
| 147 |
+
### Training Hyperparameters
|
| 148 |
+
|
| 149 |
+
Here's the exact configuration we used for the Pepe model:
|
| 150 |
+
|
| 151 |
+
```bash
|
| 152 |
+
accelerate launch train_text_to_image_lora.py \
|
| 153 |
+
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
|
| 154 |
+
--train_data_dir="/path/to/pepe-data" \
|
| 155 |
+
--caption_column="prompt" \
|
| 156 |
+
--image_column="image" \
|
| 157 |
+
--resolution=512 \
|
| 158 |
+
--center_crop \
|
| 159 |
+
--random_flip \
|
| 160 |
+
--train_batch_size=1 \
|
| 161 |
+
--gradient_accumulation_steps=4 \
|
| 162 |
+
--max_train_steps=2000 \
|
| 163 |
+
--learning_rate=1e-4 \
|
| 164 |
+
--lr_scheduler="cosine" \
|
| 165 |
+
--lr_warmup_steps=0 \
|
| 166 |
+
--output_dir="./output" \
|
| 167 |
+
--rank=16 \
|
| 168 |
+
--validation_prompt="pepe_style_frog, a high-quality, detailed image of pepe the frog smiling and holding a cup of coffee at sunrise" \
|
| 169 |
+
--validation_epochs=5 \
|
| 170 |
+
--seed=42 \
|
| 171 |
+
--mixed_precision="fp16" \
|
| 172 |
+
--checkpointing_steps=150
|
| 173 |
+
```
|
| 174 |
+
|
| 175 |
+
### Parameter Explanation
|
| 176 |
+
|
| 177 |
+
| Parameter | Value | Description |
|
| 178 |
+
|-----------|-------|-------------|
|
| 179 |
+
| `pretrained_model_name_or_path` | `runwayml/stable-diffusion-v1-5` | Base model to fine-tune |
|
| 180 |
+
| `train_data_dir` | `/path/to/data` | Path to your dataset |
|
| 181 |
+
| `resolution` | `512` | Image resolution (512x512) |
|
| 182 |
+
| `train_batch_size` | `1` | Batch size per GPU |
|
| 183 |
+
| `gradient_accumulation_steps` | `4` | Effective batch size = 1 * 4 = 4 |
|
| 184 |
+
| `max_train_steps` | `2000` | Total training steps |
|
| 185 |
+
| `learning_rate` | `1e-4` | Initial learning rate |
|
| 186 |
+
| `lr_scheduler` | `cosine` | Learning rate schedule |
|
| 187 |
+
| `rank` | `16` | LoRA rank (higher = more parameters) |
|
| 188 |
+
| `mixed_precision` | `fp16` | Use 16-bit precision for speed |
|
| 189 |
+
| `checkpointing_steps` | `150` | Save checkpoint every N steps |
|
| 190 |
+
|
| 191 |
+
### Hyperparameter Tuning Tips
|
| 192 |
+
|
| 193 |
+
**Learning Rate**:
|
| 194 |
+
- Too high: Training unstable, poor quality
|
| 195 |
+
- Too low: Slow convergence, underfitting
|
| 196 |
+
- Recommended: `1e-4` to `1e-5`
|
| 197 |
+
|
| 198 |
+
**LoRA Rank**:
|
| 199 |
+
- Lower (4-8): Faster training, smaller files, less expressive
|
| 200 |
+
- Medium (16-32): Balanced (recommended)
|
| 201 |
+
- Higher (64-128): More expressive, larger files, risk of overfitting
|
| 202 |
+
|
| 203 |
+
**Training Steps**:
|
| 204 |
+
- Small dataset (20-50 images): 500-1000 steps
|
| 205 |
+
- Medium dataset (50-200 images): 1000-2000 steps
|
| 206 |
+
- Large dataset (200+ images): 2000-5000 steps
|
| 207 |
+
|
| 208 |
+
**Batch Size**:
|
| 209 |
+
- Depends on VRAM availability
|
| 210 |
+
- Effective batch size = `batch_size × gradient_accumulation_steps`
|
| 211 |
+
- Recommended effective batch size: 4-8
|
| 212 |
+
|
| 213 |
+
---
|
| 214 |
+
|
| 215 |
+
## 🚀 Running the Training
|
| 216 |
+
|
| 217 |
+
### Option 1: Google Colab (Recommended for Beginners)
|
| 218 |
+
|
| 219 |
+
1. **Open the Notebook**:
|
| 220 |
+
- Use our provided notebook: `diffusion_model_finetuning.ipynb`
|
| 221 |
+
- Or create new Colab notebook
|
| 222 |
+
|
| 223 |
+
2. **Setup GPU**:
|
| 224 |
+
```
|
| 225 |
+
Runtime → Change runtime type → GPU (T4)
|
| 226 |
+
```
|
| 227 |
+
|
| 228 |
+
3. **Mount Google Drive** (optional):
|
| 229 |
+
```python
|
| 230 |
+
from google.colab import drive
|
| 231 |
+
drive.mount('/content/drive')
|
| 232 |
+
```
|
| 233 |
+
|
| 234 |
+
4. **Install Dependencies**:
|
| 235 |
+
```python
|
| 236 |
+
!pip install -q diffusers transformers accelerate peft
|
| 237 |
+
```
|
| 238 |
+
|
| 239 |
+
5. **Upload Dataset**:
|
| 240 |
+
- Upload to Google Drive
|
| 241 |
+
- Or download from Hugging Face
|
| 242 |
+
|
| 243 |
+
6. **Run Training**:
|
| 244 |
+
```python
|
| 245 |
+
!accelerate launch train_text_to_image_lora.py \
|
| 246 |
+
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
|
| 247 |
+
--train_data_dir="/content/drive/MyDrive/pepe-data" \
|
| 248 |
+
--max_train_steps=2000 \
|
| 249 |
+
--learning_rate=1e-4 \
|
| 250 |
+
--output_dir="./output"
|
| 251 |
+
```
|
| 252 |
+
|
| 253 |
+
7. **Monitor Progress**:
|
| 254 |
+
- Watch loss decrease
|
| 255 |
+
- Check validation images
|
| 256 |
+
- Save checkpoints to Drive
|
| 257 |
+
|
| 258 |
+
|
| 259 |
+
### Generate test image
|
| 260 |
+
image = pipe("pepe_style_frog, wizard casting spells").images[0]
|
| 261 |
+
image.save("validation.png")
|
| 262 |
+
```
|
| 263 |
+
|
| 264 |
+
|
| 265 |
+
## 📤 Model Upload
|
| 266 |
+
|
| 267 |
+
### Prepare for Upload
|
| 268 |
+
|
| 269 |
+
1. **Test Locally**:
|
| 270 |
+
```python
|
| 271 |
+
from diffusers import StableDiffusionPipeline
|
| 272 |
+
|
| 273 |
+
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
|
| 274 |
+
pipe.load_lora_weights("./output")
|
| 275 |
+
|
| 276 |
+
# Test
|
| 277 |
+
image = pipe("pepe_style_frog, happy pepe").images[0]
|
| 278 |
+
image.save("test.png")
|
| 279 |
+
```
|
| 280 |
+
|
| 281 |
+
2. **Prepare Files**:
|
| 282 |
+
```
|
| 283 |
+
output/
|
| 284 |
+
├── pytorch_lora_weights.safetensors # Main file
|
| 285 |
+
├── README.md # Model card
|
| 286 |
+
└── sample_images/ # Example outputs
|
| 287 |
+
```
|
| 288 |
+
|
| 289 |
+
### Upload to Hugging Face
|
| 290 |
+
|
| 291 |
+
1. **Install Hub CLI**:
|
| 292 |
+
```bash
|
| 293 |
+
pip install huggingface_hub
|
| 294 |
+
huggingface-cli login
|
| 295 |
+
```
|
| 296 |
+
|
| 297 |
+
2. **Create Model Card** (`README.md`):
|
| 298 |
+
```markdown
|
| 299 |
+
---
|
| 300 |
+
license: creativeml-openrail-m
|
| 301 |
+
base_model: runwayml/stable-diffusion-v1-5
|
| 302 |
+
tags:
|
| 303 |
+
- stable-diffusion
|
| 304 |
+
- lora
|
| 305 |
+
- text-to-image
|
| 306 |
+
---
|
| 307 |
+
|
| 308 |
+
# Pepe LoRA Model
|
| 309 |
+
|
| 310 |
+
Fine-tuned LoRA for generating Pepe the Frog images.
|
| 311 |
+
|
| 312 |
+
## Usage
|
| 313 |
+
```python
|
| 314 |
+
from diffusers import StableDiffusionPipeline
|
| 315 |
+
|
| 316 |
+
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
|
| 317 |
+
pipe.load_lora_weights("YOUR_USERNAME/your-model-name")
|
| 318 |
+
|
| 319 |
+
image = pipe("pepe_style_frog, happy pepe").images[0]
|
| 320 |
+
```
|
| 321 |
+
```
|
| 322 |
+
|
| 323 |
+
3. **Upload**:
|
| 324 |
+
```python
|
| 325 |
+
from huggingface_hub import HfApi
|
| 326 |
+
|
| 327 |
+
api = HfApi()
|
| 328 |
+
api.create_repo("YOUR_USERNAME/pepe-lora", repo_type="model")
|
| 329 |
+
api.upload_folder(
|
| 330 |
+
folder_path="./output",
|
| 331 |
+
repo_id="YOUR_USERNAME/pepe-lora",
|
| 332 |
+
repo_type="model"
|
| 333 |
+
)
|
| 334 |
+
```
|
| 335 |
+
|
| 336 |
+
|
| 337 |
+
### Common Issues
|
| 338 |
+
|
| 339 |
+
**Out of Memory**:
|
| 340 |
+
- Reduce `train_batch_size` to 1
|
| 341 |
+
- Enable `--gradient_checkpointing`
|
| 342 |
+
- Use `--mixed_precision="fp16"`
|
| 343 |
+
- Reduce image resolution
|
src/app.py
CHANGED
|
@@ -1,4 +1,33 @@
|
|
| 1 |
-
"""Pepe the Frog Meme Generator - Main Application
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
import streamlit as st
|
| 4 |
from PIL import Image
|
|
@@ -36,7 +65,16 @@ st.markdown("""
|
|
| 36 |
|
| 37 |
|
| 38 |
def init_session_state():
|
| 39 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
if 'generated_images' not in st.session_state:
|
| 41 |
st.session_state.generated_images = []
|
| 42 |
if 'generation_count' not in st.session_state:
|
|
@@ -47,7 +85,28 @@ def init_session_state():
|
|
| 47 |
|
| 48 |
@st.cache_resource
|
| 49 |
def load_generator(model_name: str = "Pepe Fine-tuned (LoRA)"):
|
| 50 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
config = ModelConfig()
|
| 52 |
model_config = config.AVAILABLE_MODELS[model_name]
|
| 53 |
|
|
@@ -71,7 +130,15 @@ def load_generator(model_name: str = "Pepe Fine-tuned (LoRA)"):
|
|
| 71 |
|
| 72 |
|
| 73 |
def get_example_prompts():
|
| 74 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
return [
|
| 76 |
"pepe the frog as a wizard casting spells",
|
| 77 |
"pepe the frog coding on a laptop",
|
|
@@ -82,7 +149,29 @@ def get_example_prompts():
|
|
| 82 |
|
| 83 |
|
| 84 |
def main():
|
| 85 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 86 |
init_session_state()
|
| 87 |
|
| 88 |
# Sidebar (needs to be first to define selected_model)
|
|
@@ -199,7 +288,8 @@ def main():
|
|
| 199 |
if st.session_state.generated_images:
|
| 200 |
placeholder.image(
|
| 201 |
st.session_state.generated_images[-1],
|
| 202 |
-
width='stretch'
|
|
|
|
| 203 |
)
|
| 204 |
else:
|
| 205 |
placeholder.info("Your meme will appear here...")
|
|
|
|
| 1 |
+
"""Pepe the Frog Meme Generator - Main Streamlit Application.
|
| 2 |
+
|
| 3 |
+
This is the main entry point for the web application. It provides a user-friendly
|
| 4 |
+
interface for generating Pepe memes using AI-powered Stable Diffusion models.
|
| 5 |
+
|
| 6 |
+
The application features:
|
| 7 |
+
- Model selection (multiple LoRA variants, LCM support)
|
| 8 |
+
- Style presets and raw prompt mode
|
| 9 |
+
- Advanced generation settings (steps, guidance, seed)
|
| 10 |
+
- Text overlay capability for meme creation
|
| 11 |
+
- Gallery system for viewing generated images
|
| 12 |
+
- Download functionality
|
| 13 |
+
- Progress tracking during generation
|
| 14 |
+
|
| 15 |
+
Application Structure:
|
| 16 |
+
1. Page configuration and styling
|
| 17 |
+
2. Session state initialization
|
| 18 |
+
3. Model loading and caching
|
| 19 |
+
4. Sidebar UI (model selection, settings)
|
| 20 |
+
5. Main content area (prompt input, generation)
|
| 21 |
+
6. Results display and download
|
| 22 |
+
7. Gallery view
|
| 23 |
+
|
| 24 |
+
Usage:
|
| 25 |
+
Run with: streamlit run src/app.py
|
| 26 |
+
Access at: http://localhost:8501
|
| 27 |
+
|
| 28 |
+
Author: MJaheen
|
| 29 |
+
License: MIT
|
| 30 |
+
"""
|
| 31 |
|
| 32 |
import streamlit as st
|
| 33 |
from PIL import Image
|
|
|
|
| 65 |
|
| 66 |
|
| 67 |
def init_session_state():
|
| 68 |
+
"""
|
| 69 |
+
Initialize Streamlit session state variables.
|
| 70 |
+
|
| 71 |
+
This function sets up persistent state across app reruns:
|
| 72 |
+
- generated_images: List of all generated images in current session
|
| 73 |
+
- generation_count: Counter for tracking total generations
|
| 74 |
+
- current_model: Currently selected model name for cache invalidation
|
| 75 |
+
|
| 76 |
+
Session state persists across reruns but is reset when the page is refreshed.
|
| 77 |
+
"""
|
| 78 |
if 'generated_images' not in st.session_state:
|
| 79 |
st.session_state.generated_images = []
|
| 80 |
if 'generation_count' not in st.session_state:
|
|
|
|
| 85 |
|
| 86 |
@st.cache_resource
|
| 87 |
def load_generator(model_name: str = "Pepe Fine-tuned (LoRA)"):
|
| 88 |
+
"""
|
| 89 |
+
Load and cache the Stable Diffusion generator.
|
| 90 |
+
|
| 91 |
+
This function loads a PepeGenerator instance configured with the selected
|
| 92 |
+
model. It's cached using @st.cache_resource to avoid reloading the model
|
| 93 |
+
on every interaction, which would be very slow.
|
| 94 |
+
|
| 95 |
+
The cache is automatically invalidated when:
|
| 96 |
+
- The model_name parameter changes
|
| 97 |
+
- The user manually clears cache
|
| 98 |
+
|
| 99 |
+
Args:
|
| 100 |
+
model_name: Name of the model from AVAILABLE_MODELS dict.
|
| 101 |
+
Examples: "Pepe Fine-tuned (LoRA)", "Pepe + LCM (FAST)"
|
| 102 |
+
|
| 103 |
+
Returns:
|
| 104 |
+
PepeGenerator: Configured generator instance with loaded model.
|
| 105 |
+
|
| 106 |
+
Note:
|
| 107 |
+
Model loading can take 30-60 seconds on first load as it downloads
|
| 108 |
+
weights from Hugging Face (~4GB for base model + LoRA).
|
| 109 |
+
"""
|
| 110 |
config = ModelConfig()
|
| 111 |
model_config = config.AVAILABLE_MODELS[model_name]
|
| 112 |
|
|
|
|
| 130 |
|
| 131 |
|
| 132 |
def get_example_prompts():
|
| 133 |
+
"""
|
| 134 |
+
Return a list of example prompts for inspiration.
|
| 135 |
+
|
| 136 |
+
These prompts are designed to work well with the fine-tuned Pepe model
|
| 137 |
+
and demonstrate various styles, activities, and scenarios.
|
| 138 |
+
|
| 139 |
+
Returns:
|
| 140 |
+
list: List of example prompt strings with trigger word and descriptions.
|
| 141 |
+
"""
|
| 142 |
return [
|
| 143 |
"pepe the frog as a wizard casting spells",
|
| 144 |
"pepe the frog coding on a laptop",
|
|
|
|
| 149 |
|
| 150 |
|
| 151 |
def main():
|
| 152 |
+
"""
|
| 153 |
+
Main application function that builds and runs the Streamlit UI.
|
| 154 |
+
|
| 155 |
+
This function orchestrates the entire application flow:
|
| 156 |
+
1. Initializes session state
|
| 157 |
+
2. Loads configuration and sets up sidebar controls
|
| 158 |
+
3. Handles model selection and switching
|
| 159 |
+
4. Processes user input (prompts, settings)
|
| 160 |
+
5. Generates images when requested
|
| 161 |
+
6. Displays results with download options
|
| 162 |
+
7. Shows gallery of previous generations
|
| 163 |
+
|
| 164 |
+
The UI is organized into:
|
| 165 |
+
- Sidebar: Model selection, style presets, advanced settings
|
| 166 |
+
- Main area: Prompt input, generation button, results
|
| 167 |
+
- Bottom: Gallery view (expandable)
|
| 168 |
+
|
| 169 |
+
Flow:
|
| 170 |
+
User selects model → Enters prompt → Adjusts settings →
|
| 171 |
+
Clicks generate → Shows progress → Displays result →
|
| 172 |
+
Offers download → Adds to gallery
|
| 173 |
+
"""
|
| 174 |
+
# Initialize session state for persistent data across reruns
|
| 175 |
init_session_state()
|
| 176 |
|
| 177 |
# Sidebar (needs to be first to define selected_model)
|
|
|
|
| 288 |
if st.session_state.generated_images:
|
| 289 |
placeholder.image(
|
| 290 |
st.session_state.generated_images[-1],
|
| 291 |
+
#width='stretch'
|
| 292 |
+
st.image(img, use_column_width=True)
|
| 293 |
)
|
| 294 |
else:
|
| 295 |
placeholder.info("Your meme will appear here...")
|
src/model/config.py
CHANGED
|
@@ -1,4 +1,10 @@
|
|
| 1 |
-
"""Configuration management for the meme generator
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
from dataclasses import dataclass
|
| 4 |
from typing import Optional
|
|
@@ -6,14 +12,45 @@ from typing import Optional
|
|
| 6 |
|
| 7 |
@dataclass
|
| 8 |
class ModelConfig:
|
| 9 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
# Available models
|
| 12 |
AVAILABLE_MODELS: dict = None
|
| 13 |
|
| 14 |
def __post_init__(self):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
if self.AVAILABLE_MODELS is None:
|
| 16 |
self.AVAILABLE_MODELS = {
|
|
|
|
| 17 |
"Pepe Fine-tuned (LoRA)": {
|
| 18 |
"base": "runwayml/stable-diffusion-v1-5",
|
| 19 |
"lora": "MJaheen/Pepe_The_Frog_model_v1_lora",
|
|
@@ -94,7 +131,7 @@ class ModelConfig:
|
|
| 94 |
# Performance
|
| 95 |
ENABLE_ATTENTION_SLICING: bool = True
|
| 96 |
ENABLE_VAE_SLICING: bool = True
|
| 97 |
-
FORCE_CPU: bool =
|
| 98 |
|
| 99 |
# Available styles
|
| 100 |
AVAILABLE_STYLES: tuple = (
|
|
|
|
| 1 |
+
"""Configuration management for the Pepe meme generator.
|
| 2 |
+
|
| 3 |
+
This module defines all configuration parameters for model selection,
|
| 4 |
+
generation settings, and application behavior. The ModelConfig dataclass
|
| 5 |
+
provides a centralized configuration system with sensible defaults.
|
| 6 |
+
|
| 7 |
+
"""
|
| 8 |
|
| 9 |
from dataclasses import dataclass
|
| 10 |
from typing import Optional
|
|
|
|
| 12 |
|
| 13 |
@dataclass
|
| 14 |
class ModelConfig:
|
| 15 |
+
"""
|
| 16 |
+
Central configuration for model and generation parameters.
|
| 17 |
+
|
| 18 |
+
This dataclass contains all settings for model selection, generation
|
| 19 |
+
parameters, and optimization flags. It supports multiple models including
|
| 20 |
+
fine-tuned LoRA variants and fast LCM models.
|
| 21 |
+
|
| 22 |
+
Attributes:
|
| 23 |
+
AVAILABLE_MODELS: Dictionary of available model configurations
|
| 24 |
+
SELECTED_MODEL: Currently selected model name
|
| 25 |
+
BASE_MODEL: HuggingFace ID of the base Stable Diffusion model
|
| 26 |
+
LORA_PATH: Path or HuggingFace ID of LoRA weights
|
| 27 |
+
USE_LORA: Whether to load and use LoRA weights
|
| 28 |
+
USE_LCM: Whether to use LCM (Latent Consistency Model) for fast inference
|
| 29 |
+
LCM_LORA_PATH: Path to LCM-LoRA weights
|
| 30 |
+
TRIGGER_WORD: Trigger word to activate fine-tuned style
|
| 31 |
+
DEFAULT_STEPS: Default number of diffusion steps
|
| 32 |
+
DEFAULT_GUIDANCE: Default guidance scale (CFG)
|
| 33 |
+
DEFAULT_WIDTH: Default output image width
|
| 34 |
+
DEFAULT_HEIGHT: Default output image height
|
| 35 |
+
DEFAULT_NEGATIVE_PROMPT: Default negative prompt for all generations
|
| 36 |
+
FORCE_CPU: Force CPU mode (disable GPU)
|
| 37 |
+
ENABLE_XFORMERS: Enable memory-efficient attention
|
| 38 |
+
"""
|
| 39 |
|
| 40 |
# Available models
|
| 41 |
AVAILABLE_MODELS: dict = None
|
| 42 |
|
| 43 |
def __post_init__(self):
|
| 44 |
+
"""
|
| 45 |
+
Initialize AVAILABLE_MODELS dictionary if not already set.
|
| 46 |
+
|
| 47 |
+
This method is called automatically after __init__. It populates
|
| 48 |
+
the AVAILABLE_MODELS dictionary with all supported model configurations.
|
| 49 |
+
Each model can have different base models, LoRA weights, and optimization flags.
|
| 50 |
+
"""
|
| 51 |
if self.AVAILABLE_MODELS is None:
|
| 52 |
self.AVAILABLE_MODELS = {
|
| 53 |
+
# Primary fine-tuned model - Best quality, trained on Pepe dataset
|
| 54 |
"Pepe Fine-tuned (LoRA)": {
|
| 55 |
"base": "runwayml/stable-diffusion-v1-5",
|
| 56 |
"lora": "MJaheen/Pepe_The_Frog_model_v1_lora",
|
|
|
|
| 131 |
# Performance
|
| 132 |
ENABLE_ATTENTION_SLICING: bool = True
|
| 133 |
ENABLE_VAE_SLICING: bool = True
|
| 134 |
+
FORCE_CPU: bool = False # Set to True to force CPU, False to use GPU if available
|
| 135 |
|
| 136 |
# Available styles
|
| 137 |
AVAILABLE_STYLES: tuple = (
|
src/model/generator.py
CHANGED
|
@@ -1,4 +1,16 @@
|
|
| 1 |
-
"""Pepe Meme Generator - Core generation logic
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
from typing import Optional, List, Callable
|
| 4 |
import torch
|
|
@@ -14,10 +26,43 @@ logger = logging.getLogger(__name__)
|
|
| 14 |
|
| 15 |
|
| 16 |
class PepeGenerator:
|
| 17 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
def __init__(self, config: Optional[ModelConfig] = None):
|
| 20 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
self.config = config or ModelConfig()
|
| 22 |
self.device = self._get_device(self.config.FORCE_CPU)
|
| 23 |
self.pipe = self._load_model(
|
|
@@ -153,28 +198,38 @@ class PepeGenerator:
|
|
| 153 |
def generate(
|
| 154 |
self,
|
| 155 |
prompt: str,
|
| 156 |
-
style: str = "default",
|
| 157 |
negative_prompt: Optional[str] = None,
|
| 158 |
-
num_inference_steps: int =
|
| 159 |
guidance_scale: float = 7.5,
|
| 160 |
-
seed: Optional[int] = None,
|
| 161 |
width: int = 512,
|
| 162 |
height: int = 512,
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
) -> Image
|
| 166 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 167 |
|
| 168 |
Args:
|
| 169 |
-
|
| 170 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 171 |
"""
|
| 172 |
-
|
| 173 |
-
|
| 174 |
-
if raw_prompt:
|
| 175 |
-
enhanced_prompt = prompt
|
| 176 |
-
else:
|
| 177 |
-
enhanced_prompt = self._apply_style_preset(prompt, style)
|
| 178 |
|
| 179 |
# Set default negative prompt
|
| 180 |
if negative_prompt is None:
|
|
@@ -189,11 +244,11 @@ class PepeGenerator:
|
|
| 189 |
logger.debug(f"Full prompt: {enhanced_prompt}")
|
| 190 |
logger.debug(f"Model config - Base: {self.config.BASE_MODEL}, LoRA: {self.config.USE_LORA}")
|
| 191 |
|
| 192 |
-
# Create callback wrapper if provided (using new API)
|
| 193 |
callback_on_step_end_fn = None
|
| 194 |
-
if
|
| 195 |
def callback_on_step_end_fn(pipe, step, timestep, callback_kwargs):
|
| 196 |
-
|
| 197 |
return callback_kwargs
|
| 198 |
|
| 199 |
# Generate image (removed autocast for CPU compatibility)
|
|
|
|
| 1 |
+
"""Pepe Meme Generator - Core generation logic.
|
| 2 |
+
|
| 3 |
+
This module contains the main PepeGenerator class which handles:
|
| 4 |
+
- Loading and caching Stable Diffusion models
|
| 5 |
+
- Managing LoRA and LCM-LoRA adapters
|
| 6 |
+
- Configuring schedulers and optimizations
|
| 7 |
+
- Generating images from text prompts
|
| 8 |
+
- Progress tracking during generation
|
| 9 |
+
|
| 10 |
+
The generator supports multiple models, automatic GPU/CPU detection,
|
| 11 |
+
memory optimizations, and both standard and fast (LCM) inference modes.
|
| 12 |
+
|
| 13 |
+
"""
|
| 14 |
|
| 15 |
from typing import Optional, List, Callable
|
| 16 |
import torch
|
|
|
|
| 26 |
|
| 27 |
|
| 28 |
class PepeGenerator:
|
| 29 |
+
"""
|
| 30 |
+
Main generator class for creating Pepe meme images.
|
| 31 |
+
|
| 32 |
+
This class manages the entire image generation pipeline including:
|
| 33 |
+
- Model loading and caching (with Streamlit cache_resource)
|
| 34 |
+
- LoRA and LCM-LoRA adapter management
|
| 35 |
+
- Scheduler configuration (DPM Solver or LCM)
|
| 36 |
+
- Memory optimizations (attention slicing, VAE slicing, xformers)
|
| 37 |
+
- Device management (automatic CUDA/CPU detection)
|
| 38 |
+
- Progress tracking callbacks
|
| 39 |
+
|
| 40 |
+
The generator is designed to work efficiently on both GPU and CPU,
|
| 41 |
+
with automatic optimizations based on available hardware.
|
| 42 |
+
|
| 43 |
+
Attributes:
|
| 44 |
+
config: ModelConfig instance with generation settings
|
| 45 |
+
device: Torch device ('cuda' or 'cpu')
|
| 46 |
+
pipe: Cached StableDiffusionPipeline instance
|
| 47 |
+
"""
|
| 48 |
|
| 49 |
def __init__(self, config: Optional[ModelConfig] = None):
|
| 50 |
+
"""
|
| 51 |
+
Initialize the Pepe generator with configuration.
|
| 52 |
+
|
| 53 |
+
Sets up the generator by determining the compute device (GPU/CPU),
|
| 54 |
+
loading the model pipeline, and caching it for reuse. The model
|
| 55 |
+
loading is cached using Streamlit's cache_resource decorator to avoid
|
| 56 |
+
reloading on every interaction.
|
| 57 |
+
|
| 58 |
+
Args:
|
| 59 |
+
config: ModelConfig instance. If None, uses default configuration.
|
| 60 |
+
|
| 61 |
+
Example:
|
| 62 |
+
>>> config = ModelConfig()
|
| 63 |
+
>>> config.USE_LCM = True # Enable fast generation
|
| 64 |
+
>>> generator = PepeGenerator(config)
|
| 65 |
+
"""
|
| 66 |
self.config = config or ModelConfig()
|
| 67 |
self.device = self._get_device(self.config.FORCE_CPU)
|
| 68 |
self.pipe = self._load_model(
|
|
|
|
| 198 |
def generate(
|
| 199 |
self,
|
| 200 |
prompt: str,
|
|
|
|
| 201 |
negative_prompt: Optional[str] = None,
|
| 202 |
+
num_inference_steps: int = 25,
|
| 203 |
guidance_scale: float = 7.5,
|
|
|
|
| 204 |
width: int = 512,
|
| 205 |
height: int = 512,
|
| 206 |
+
seed: Optional[int] = None,
|
| 207 |
+
progress_callback: Optional[Callable[[int, int], None]] = None
|
| 208 |
+
) -> Image:
|
| 209 |
+
"""
|
| 210 |
+
Generate a Pepe meme image from a text prompt.
|
| 211 |
+
|
| 212 |
+
This method runs the diffusion process to generate an image based on
|
| 213 |
+
the provided text prompt. It supports various parameters to control
|
| 214 |
+
the generation quality, style, and randomness.
|
| 215 |
|
| 216 |
Args:
|
| 217 |
+
prompt: Text description of the desired image. For best results with
|
| 218 |
+
the fine-tuned model, include the trigger word 'pepe_style_frog'.
|
| 219 |
+
negative_prompt: Text describing what to avoid in the image.
|
| 220 |
+
If None, uses default from config.
|
| 221 |
+
num_inference_steps: Number of denoising steps (4-8 for LCM, 20-50 normal).
|
| 222 |
+
guidance_scale: CFG scale (1.0-2.0 for LCM, 5.0-15.0 normal).
|
| 223 |
+
width: Output image width in pixels (must be divisible by 8).
|
| 224 |
+
height: Output image height in pixels (must be divisible by 8).
|
| 225 |
+
seed: Random seed for reproducible generation.
|
| 226 |
+
progress_callback: Optional callback(current_step, total_steps).
|
| 227 |
+
|
| 228 |
+
Returns:
|
| 229 |
+
PIL Image object containing the generated image.
|
| 230 |
"""
|
| 231 |
+
# Use the prompt as-is (style handling is done in app.py before calling generate)
|
| 232 |
+
enhanced_prompt = prompt
|
|
|
|
|
|
|
|
|
|
|
|
|
| 233 |
|
| 234 |
# Set default negative prompt
|
| 235 |
if negative_prompt is None:
|
|
|
|
| 244 |
logger.debug(f"Full prompt: {enhanced_prompt}")
|
| 245 |
logger.debug(f"Model config - Base: {self.config.BASE_MODEL}, LoRA: {self.config.USE_LORA}")
|
| 246 |
|
| 247 |
+
# Create callback wrapper if provided (using new diffusers API)
|
| 248 |
callback_on_step_end_fn = None
|
| 249 |
+
if progress_callback:
|
| 250 |
def callback_on_step_end_fn(pipe, step, timestep, callback_kwargs):
|
| 251 |
+
progress_callback(step + 1, num_inference_steps)
|
| 252 |
return callback_kwargs
|
| 253 |
|
| 254 |
# Generate image (removed autocast for CPU compatibility)
|
src/utils/image_processor.py
CHANGED
|
@@ -1,4 +1,16 @@
|
|
| 1 |
-
"""Image
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
from PIL import Image, ImageDraw, ImageFont, ImageEnhance
|
| 4 |
from typing import Optional, Tuple
|
|
@@ -8,7 +20,18 @@ logger = logging.getLogger(__name__)
|
|
| 8 |
|
| 9 |
|
| 10 |
class ImageProcessor:
|
| 11 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
@staticmethod
|
| 14 |
def add_meme_text(
|
|
@@ -18,7 +41,27 @@ class ImageProcessor:
|
|
| 18 |
font_size: int = 40,
|
| 19 |
font_path: Optional[str] = None,
|
| 20 |
) -> Image.Image:
|
| 21 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
img = image.copy()
|
| 24 |
draw = ImageDraw.Draw(img)
|
|
@@ -159,7 +202,34 @@ class ImageProcessor:
|
|
| 159 |
sharpness: float = 1.2,
|
| 160 |
contrast: float = 1.1,
|
| 161 |
) -> Image.Image:
|
| 162 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 163 |
|
| 164 |
# Sharpen
|
| 165 |
enhancer = ImageEnhance.Sharpness(image)
|
|
|
|
| 1 |
+
"""Image Processing Utilities for Meme Creation.
|
| 2 |
+
|
| 3 |
+
This module provides utilities for post-processing generated images:
|
| 4 |
+
- Adding classic meme text with outlines
|
| 5 |
+
- Adding signatures/watermarks
|
| 6 |
+
- Enhancing image quality (sharpness, contrast)
|
| 7 |
+
|
| 8 |
+
All methods are static and can be used without instantiation.
|
| 9 |
+
The ImageProcessor class acts as a namespace for image manipulation functions.
|
| 10 |
+
|
| 11 |
+
Author: MJaheen
|
| 12 |
+
License: MIT
|
| 13 |
+
"""
|
| 14 |
|
| 15 |
from PIL import Image, ImageDraw, ImageFont, ImageEnhance
|
| 16 |
from typing import Optional, Tuple
|
|
|
|
| 20 |
|
| 21 |
|
| 22 |
class ImageProcessor:
|
| 23 |
+
"""
|
| 24 |
+
Static utility class for image post-processing operations.
|
| 25 |
+
|
| 26 |
+
This class provides methods for enhancing generated images with meme text,
|
| 27 |
+
signatures, and quality improvements. All methods are static and work with
|
| 28 |
+
PIL Image objects.
|
| 29 |
+
|
| 30 |
+
Methods:
|
| 31 |
+
add_meme_text: Add top/bottom text in classic meme style
|
| 32 |
+
add_signature: Add watermark/signature to image
|
| 33 |
+
enhance_image: Apply sharpness and contrast enhancements
|
| 34 |
+
"""
|
| 35 |
|
| 36 |
@staticmethod
|
| 37 |
def add_meme_text(
|
|
|
|
| 41 |
font_size: int = 40,
|
| 42 |
font_path: Optional[str] = None,
|
| 43 |
) -> Image.Image:
|
| 44 |
+
"""
|
| 45 |
+
Add classic Impact-font meme text with white text and black outline.
|
| 46 |
+
|
| 47 |
+
Creates the traditional meme format with text at the top and/or bottom
|
| 48 |
+
of the image. Text is automatically converted to uppercase and rendered
|
| 49 |
+
with a thick black outline for readability on any background.
|
| 50 |
+
|
| 51 |
+
Args:
|
| 52 |
+
image: Input PIL Image to add text to
|
| 53 |
+
top_text: Text to display at top of image (default: "")
|
| 54 |
+
bottom_text: Text to display at bottom of image (default: "")
|
| 55 |
+
font_size: Size of the font in points (default: 40)
|
| 56 |
+
font_path: Optional path to custom font file (default: uses Impact)
|
| 57 |
+
|
| 58 |
+
Returns:
|
| 59 |
+
PIL Image with meme text overlay (copy of original, not modified in-place)
|
| 60 |
+
|
| 61 |
+
Note:
|
| 62 |
+
Falls back to default font if Impact font is not found.
|
| 63 |
+
Text is centered horizontally automatically.
|
| 64 |
+
"""
|
| 65 |
|
| 66 |
img = image.copy()
|
| 67 |
draw = ImageDraw.Draw(img)
|
|
|
|
| 202 |
sharpness: float = 1.2,
|
| 203 |
contrast: float = 1.1,
|
| 204 |
) -> Image.Image:
|
| 205 |
+
"""
|
| 206 |
+
Apply sharpness and contrast enhancements to improve image quality.
|
| 207 |
+
|
| 208 |
+
This method applies PIL's ImageEnhance filters to make the image
|
| 209 |
+
crisper and more vibrant. Useful for post-processing AI-generated
|
| 210 |
+
images which can sometimes appear slightly soft.
|
| 211 |
+
|
| 212 |
+
Args:
|
| 213 |
+
image: Input PIL Image to enhance
|
| 214 |
+
sharpness: Sharpness multiplier (default: 1.2)
|
| 215 |
+
- 0.0: Blurred
|
| 216 |
+
- 1.0: Original sharpness
|
| 217 |
+
- 2.0: Very sharp
|
| 218 |
+
Recommended range: 1.0-1.5
|
| 219 |
+
contrast: Contrast multiplier (default: 1.1)
|
| 220 |
+
- 0.0: Gray
|
| 221 |
+
- 1.0: Original contrast
|
| 222 |
+
- 2.0: High contrast
|
| 223 |
+
Recommended range: 1.0-1.3
|
| 224 |
+
|
| 225 |
+
Returns:
|
| 226 |
+
Enhanced PIL Image (modified in-place)
|
| 227 |
+
|
| 228 |
+
Example:
|
| 229 |
+
>>> image = Image.open("soft_image.png")
|
| 230 |
+
>>> enhanced = ImageProcessor.enhance_image(image, sharpness=1.3, contrast=1.2)
|
| 231 |
+
>>> enhanced.save("sharp_image.png")
|
| 232 |
+
"""
|
| 233 |
|
| 234 |
# Sharpen
|
| 235 |
enhancer = ImageEnhance.Sharpness(image)
|