Spaces:
Running
Running
| # π¨ Stable Diffusion & CatVTON Implementation | |
| <div align="center"> | |
|  <br> | |
|  | |
|  | |
|  | |
| *A comprehensive implementation of Stable Diffusion from scratch with CatVTON virtual try-on capabilities* | |
| </div> | |
| --- | |
| ## Table of Contents | |
| * [Overview](#overview) | |
| * [Project Structure](#project-structure) | |
| * [Features](#features) | |
| * [Setup & Installation](#setup--installation) | |
| * [Model Downloads](#model-downloads) | |
| * [CatVTON Integration](#catvton-integration) | |
| * [References](#references) | |
| * [Author](#author) | |
| * [License](#license) | |
| --- | |
| ## Overview | |
| This project implements **Stable Diffusion from scratch** using PyTorch, extended with **CatVTON (Virtual Cloth Try-On)** for realistic fashion try-on. | |
| * Complete Stable Diffusion pipeline (Branch: `main`) | |
| * CatVTON virtual try-on extension (Branch: `CatVTON`) | |
| * DDPM-based denoising, VAE, and custom attention | |
| * Inpainting and text-to-image capabilities | |
| --- | |
| ## Project Structure | |
| ```text | |
| stable-diffusion/ | |
| βββ Core Components | |
| β βββ attention.py # Attention mechanisms | |
| β βββ clip.py # CLIP model | |
| β βββ ddpm.py # DDPM sampler | |
| β βββ decoder.py # VAE decoder | |
| β βββ encoder.py # VAE encoder | |
| β βββ diffusion.py # Diffusion logic | |
| β βββ model.py # Weight loading | |
| β βββ pipeline.py # Main pipeline logic | |
| β | |
| βββ Utilities & Interface | |
| β βββ interface.py # Interactive script | |
| β βββ model_converter.py # Weight conversion utilities | |
| β βββ requirements.txt # Python dependencies | |
| β | |
| βββ Data & Models | |
| β βββ vocab.json | |
| β βββ merges.txt | |
| β βββ inkpunk-diffusion-v1.ckpt | |
| β βββ sd-v1-5-inpainting.ckpt | |
| β | |
| βββ Sample Data | |
| β βββ person.jpg | |
| β βββ garment.jpg | |
| β βββ agnostic_mask.png | |
| β βββ dog.jpg | |
| β βββ image.png | |
| β βββ zalando-hd-resized.zip | |
| β | |
| βββ Notebooks & Docs | |
| βββ test.ipynb | |
| βββ README.md | |
| ``` | |
| --- | |
| ## Features | |
| ### Stable Diffusion Core | |
| * From-scratch implementation with modular architecture | |
| * Custom CLIP encoder integration | |
| * Latent space generation using VAE | |
| * DDPM sampling process | |
| * Self-attention mechanisms for denoising | |
| ### CatVTON Capabilities | |
| * Virtual try-on using inpainting | |
| * Pose-aligned garment fitting | |
| * Segmentation mask based garment overlay | |
| --- | |
| ## Setup & Installation | |
| ### Prerequisites | |
| * Python 3.10.9 | |
| * CUDA-compatible GPU | |
| * Git, Conda or venv | |
| ### Clone Repository | |
| ```bash | |
| git clone https://github.com/Harsh-Kesharwani/stable-diffusion.git | |
| cd stable-diffusion | |
| git checkout CatVTON # for try-on features | |
| ``` | |
| ### Create Environment | |
| ```bash | |
| conda create -n stable-diffusion python=3.10.9 | |
| conda activate stable-diffusion | |
| ``` | |
| ### Install Requirements | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ### Test Installation | |
| ```bash | |
| python -c "import torch; print(torch.__version__)" | |
| python -c "import torch; print(torch.cuda.is_available())" | |
| ``` | |
| --- | |
| ## Model Downloads | |
| ### Tokenizer Files (from SD v1.4) | |
| * `vocab.json` | |
| * `merges.txt` | |
| Download from: [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/main/tokenizer) | |
| ### Model Checkpoints | |
| * `inkpunk-diffusion-v1.ckpt`: [Inkpunk Model](https://huggingface.co/Envvi/Inkpunk-Diffusion/tree/main) | |
| * `sd-v1-5-inpainting.ckpt`: [Inpainting Weights](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-inpainting) | |
| ### Download Script | |
| ```bash | |
| mkdir -p data | |
| wget -O data/vocab.json "https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/tokenizer/vocab.json" | |
| wget -O data/merges.txt "https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/tokenizer/merges.txt" | |
| ``` | |
| --- | |
| ## CatVTON Integration | |
| The CatVTON extension allows realistic cloth try-on using Stable Diffusion inpainting. | |
| ### Highlights | |
| * `sd-v1-5-inpainting.ckpt` for image completion | |
| * Garment alignment to human pose | |
| * Agnostic segmentation mask usage | |
| Run the interface: | |
| ```bash | |
| python interface.py | |
| ``` | |
| --- | |
| ## References | |
| ### Articles & Guides | |
| * [Stable Diffusion from Scratch (Medium)](https://medium.com/@sayedebad.777/implementing-stable-diffusion-from-scratch-using-pytorch-f07d50efcd97) | |
| * [YouTube: Diffusion Implementation](https://www.youtube.com/watch?v=ZBKpAp_6TGI) | |
| ### HuggingFace Resources | |
| * [Stable Diffusion v1.5 Inpainting](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-inpainting) | |
| * [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) | |
| * [Inkpunk Diffusion](https://huggingface.co/Envvi/Inkpunk-Diffusion) | |
| ### Papers | |
| * Stable Diffusion: Latent Diffusion Models | |
| * DDPM: Denoising Diffusion Probabilistic Models | |
| * CatVTON: Category-aware Try-On Network | |
| --- | |
| ## Author | |
| <div align="center"> | |
| **Harsh Kesharwani** | |
| [](https://github.com/Harsh-Kesharwani) | |
| [](https://www.linkedin.com/in/harsh-kesharwani/) | |
| [](mailto:[email protected]) | |
| *Passionate about AI, Computer Vision, and Generative Models* | |
| </div> | |
| --- | |
| ## License | |
| This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details. | |
| --- | |
| ## Acknowledgments | |
| * CompVis team for Stable Diffusion | |
| * HuggingFace for models and APIs | |
| * Zalando Research for dataset | |
| * Open-source contributors and educators | |
| --- | |
| <div align="center"> | |
| **β Star this repo if you found it helpful!** | |
| *Built with β€οΈ by [Harsh Kesharwani](https://www.linkedin.com/in/harsh-kesharwani/)* | |
| </div> | |