harsh99's picture
readme file updates
b9e9532
|
raw
history blame
6.29 kB

🎨 Stable Diffusion & CatVTON Implementation

Stable Diffusion
CatVTON PyTorch Python

A comprehensive implementation of Stable Diffusion from scratch with CatVTON virtual try-on capabilities


Table of Contents


Overview

This project implements Stable Diffusion from scratch using PyTorch, extended with CatVTON (Virtual Cloth Try-On) for realistic fashion try-on.

  • Complete Stable Diffusion pipeline (Branch: main)
  • CatVTON virtual try-on extension (Branch: CatVTON)
  • DDPM-based denoising, VAE, and custom attention
  • Inpainting and text-to-image capabilities

Project Structure

stable-diffusion/
β”œβ”€β”€ Core Components
β”‚   β”œβ”€β”€ attention.py          # Attention mechanisms
β”‚   β”œβ”€β”€ clip.py               # CLIP model
β”‚   β”œβ”€β”€ ddpm.py               # DDPM sampler
β”‚   β”œβ”€β”€ decoder.py            # VAE decoder
β”‚   β”œβ”€β”€ encoder.py            # VAE encoder
β”‚   β”œβ”€β”€ diffusion.py          # Diffusion logic
β”‚   β”œβ”€β”€ model.py              # Weight loading
β”‚   └── pipeline.py           # Main pipeline logic
β”‚
β”œβ”€β”€ Utilities & Interface
β”‚   β”œβ”€β”€ interface.py          # Interactive script
β”‚   β”œβ”€β”€ model_converter.py    # Weight conversion utilities
β”‚   └── requirements.txt      # Python dependencies
β”‚
β”œβ”€β”€ Data & Models
β”‚   β”œβ”€β”€ vocab.json
β”‚   β”œβ”€β”€ merges.txt
β”‚   β”œβ”€β”€ inkpunk-diffusion-v1.ckpt
β”‚   └── sd-v1-5-inpainting.ckpt
β”‚
β”œβ”€β”€ Sample Data
β”‚   β”œβ”€β”€ person.jpg
β”‚   β”œβ”€β”€ garment.jpg
β”‚   β”œβ”€β”€ agnostic_mask.png
β”‚   β”œβ”€β”€ dog.jpg
β”‚   β”œβ”€β”€ image.png
β”‚   └── zalando-hd-resized.zip
β”‚
└── Notebooks & Docs
    β”œβ”€β”€ test.ipynb
    └── README.md

Features

Stable Diffusion Core

  • From-scratch implementation with modular architecture
  • Custom CLIP encoder integration
  • Latent space generation using VAE
  • DDPM sampling process
  • Self-attention mechanisms for denoising

CatVTON Capabilities

  • Virtual try-on using inpainting
  • Pose-aligned garment fitting
  • Segmentation mask based garment overlay

Setup & Installation

Prerequisites

  • Python 3.10.9
  • CUDA-compatible GPU
  • Git, Conda or venv

Clone Repository

git clone https://github.com/Harsh-Kesharwani/stable-diffusion.git
cd stable-diffusion
git checkout CatVTON  # for try-on features

Create Environment

conda create -n stable-diffusion python=3.10.9
conda activate stable-diffusion

Install Requirements

pip install -r requirements.txt

Test Installation

python -c "import torch; print(torch.__version__)"
python -c "import torch; print(torch.cuda.is_available())"

Model Downloads

Tokenizer Files (from SD v1.4)

  • vocab.json
  • merges.txt

Download from: CompVis/stable-diffusion-v1-4

Model Checkpoints

Download Script

mkdir -p data
wget -O data/vocab.json "https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/tokenizer/vocab.json"
wget -O data/merges.txt "https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/tokenizer/merges.txt"

CatVTON Integration

The CatVTON extension allows realistic cloth try-on using Stable Diffusion inpainting.

Highlights

  • sd-v1-5-inpainting.ckpt for image completion
  • Garment alignment to human pose
  • Agnostic segmentation mask usage

Run the interface:

python interface.py

References

Articles & Guides

HuggingFace Resources

Papers

  • Stable Diffusion: Latent Diffusion Models
  • DDPM: Denoising Diffusion Probabilistic Models
  • CatVTON: Category-aware Try-On Network

Author

Harsh Kesharwani

GitHub LinkedIn Email

Passionate about AI, Computer Vision, and Generative Models


License

This project is licensed under the MIT License. See the LICENSE file for details.


Acknowledgments

  • CompVis team for Stable Diffusion
  • HuggingFace for models and APIs
  • Zalando Research for dataset
  • Open-source contributors and educators

⭐ Star this repo if you found it helpful!

Built with ❀️ by Harsh Kesharwani