Spaces:
Running
Running
π¨ Stable Diffusion & CatVTON Implementation
A comprehensive implementation of Stable Diffusion from scratch with CatVTON virtual try-on capabilities
Table of Contents
- Overview
- Project Structure
- Features
- Setup & Installation
- Model Downloads
- CatVTON Integration
- References
- Author
- License
Overview
This project implements Stable Diffusion from scratch using PyTorch, extended with CatVTON (Virtual Cloth Try-On) for realistic fashion try-on.
- Complete Stable Diffusion pipeline (Branch:
main) - CatVTON virtual try-on extension (Branch:
CatVTON) - DDPM-based denoising, VAE, and custom attention
- Inpainting and text-to-image capabilities
Project Structure
stable-diffusion/
βββ Core Components
β βββ attention.py # Attention mechanisms
β βββ clip.py # CLIP model
β βββ ddpm.py # DDPM sampler
β βββ decoder.py # VAE decoder
β βββ encoder.py # VAE encoder
β βββ diffusion.py # Diffusion logic
β βββ model.py # Weight loading
β βββ pipeline.py # Main pipeline logic
β
βββ Utilities & Interface
β βββ interface.py # Interactive script
β βββ model_converter.py # Weight conversion utilities
β βββ requirements.txt # Python dependencies
β
βββ Data & Models
β βββ vocab.json
β βββ merges.txt
β βββ inkpunk-diffusion-v1.ckpt
β βββ sd-v1-5-inpainting.ckpt
β
βββ Sample Data
β βββ person.jpg
β βββ garment.jpg
β βββ agnostic_mask.png
β βββ dog.jpg
β βββ image.png
β βββ zalando-hd-resized.zip
β
βββ Notebooks & Docs
βββ test.ipynb
βββ README.md
Features
Stable Diffusion Core
- From-scratch implementation with modular architecture
- Custom CLIP encoder integration
- Latent space generation using VAE
- DDPM sampling process
- Self-attention mechanisms for denoising
CatVTON Capabilities
- Virtual try-on using inpainting
- Pose-aligned garment fitting
- Segmentation mask based garment overlay
Setup & Installation
Prerequisites
- Python 3.10.9
- CUDA-compatible GPU
- Git, Conda or venv
Clone Repository
git clone https://github.com/Harsh-Kesharwani/stable-diffusion.git
cd stable-diffusion
git checkout CatVTON # for try-on features
Create Environment
conda create -n stable-diffusion python=3.10.9
conda activate stable-diffusion
Install Requirements
pip install -r requirements.txt
Test Installation
python -c "import torch; print(torch.__version__)"
python -c "import torch; print(torch.cuda.is_available())"
Model Downloads
Tokenizer Files (from SD v1.4)
vocab.jsonmerges.txt
Download from: CompVis/stable-diffusion-v1-4
Model Checkpoints
inkpunk-diffusion-v1.ckpt: Inkpunk Modelsd-v1-5-inpainting.ckpt: Inpainting Weights
Download Script
mkdir -p data
wget -O data/vocab.json "https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/tokenizer/vocab.json"
wget -O data/merges.txt "https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/tokenizer/merges.txt"
CatVTON Integration
The CatVTON extension allows realistic cloth try-on using Stable Diffusion inpainting.
Highlights
sd-v1-5-inpainting.ckptfor image completion- Garment alignment to human pose
- Agnostic segmentation mask usage
Run the interface:
python interface.py
References
Articles & Guides
HuggingFace Resources
Papers
- Stable Diffusion: Latent Diffusion Models
- DDPM: Denoising Diffusion Probabilistic Models
- CatVTON: Category-aware Try-On Network
Author
License
This project is licensed under the MIT License. See the LICENSE file for details.
Acknowledgments
- CompVis team for Stable Diffusion
- HuggingFace for models and APIs
- Zalando Research for dataset
- Open-source contributors and educators
β Star this repo if you found it helpful!
Built with β€οΈ by Harsh Kesharwani