SDXL VAE v1.0 - Improved Variational Autoencoder

High-quality Variational Autoencoder (VAE) for Stable Diffusion XL (SDXL) models, featuring enhanced reconstruction quality and improved detail preservation.

Model Description

The SDXL VAE is an improved variational autoencoder component for Stable Diffusion XL that significantly enhances the quality of generated images. This VAE was specifically retrained by Stability AI with optimized training parameters to improve local, high-frequency details in generated images.

Key Improvements:

Enhanced Training: Trained with larger batch size (256 vs 9) for better convergence
Exponential Moving Average (EMA): Weight tracking with EMA for improved stability
Superior Reconstruction: Outperforms original SD VAE across all evaluation metrics
Detail Preservation: Significantly better at preserving fine details and textures
Face Quality: Trained on LAION-Aesthetics and LAION-Humans for improved human subject rendering

This VAE is compatible with all SDXL-based models and can be used as a drop-in replacement for the standard VAE to improve output quality.

Repository Contents

E:\huggingface\sdxl-vae\
├── vae\
│   └── sdxl\
│       └── sdxl-vae.safetensors          # 320 MB - SDXL VAE weights
├── .cache\                               # Cache directory
└── README.md                             # This file

Total Repository Size: ~320 MB

Model Files

File	Size	Format	Description
`sdxl-vae.safetensors`	320 MB	SafeTensors	SDXL VAE model weights

Hardware Requirements

Minimum Requirements

VRAM: 4 GB (with image generation model)
Disk Space: 400 MB
System RAM: 8 GB

Recommended Requirements

VRAM: 8+ GB for optimal performance
Disk Space: 500 MB with cache
System RAM: 16 GB

Performance Notes

The VAE itself uses minimal VRAM (~500 MB)
Total VRAM depends on the main diffusion model used
Encoding/decoding is fast and adds minimal overhead

Usage Examples

With Diffusers Library (Recommended)

from diffusers import StableDiffusionXLPipeline, AutoencoderKL
import torch

# Load the improved SDXL VAE
vae = AutoencoderKL.from_pretrained(
    "E:/huggingface/sdxl-vae/vae/sdxl",
    torch_dtype=torch.float16
)

# Load SDXL pipeline with custom VAE
pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    vae=vae,
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True
)

pipe = pipe.to("cuda")

# Generate image with improved VAE
prompt = "A majestic mountain landscape at sunset, highly detailed"
image = pipe(
    prompt=prompt,
    num_inference_steps=50,
    guidance_scale=7.5
).images[0]

image.save("output.png")

Loading from Hugging Face Hub

from diffusers import AutoencoderKL

# Load directly from Hugging Face
vae = AutoencoderKL.from_pretrained(
    "stabilityai/sdxl-vae",
    torch_dtype=torch.float16
)

Replace VAE in Existing Pipeline

from diffusers import StableDiffusionXLPipeline, AutoencoderKL
import torch

# Load your existing SDXL pipeline
pipe = StableDiffusionXLPipeline.from_pretrained(
    "your-sdxl-model-path",
    torch_dtype=torch.float16
)

# Replace with improved VAE
improved_vae = AutoencoderKL.from_pretrained(
    "E:/huggingface/sdxl-vae/vae/sdxl",
    torch_dtype=torch.float16
)

pipe.vae = improved_vae
pipe = pipe.to("cuda")

# Generate with improved quality
image = pipe("detailed portrait photograph").images[0]

Manual Encoding/Decoding

from diffusers import AutoencoderKL
from PIL import Image
import torch
from torchvision import transforms

# Load VAE
vae = AutoencoderKL.from_pretrained(
    "E:/huggingface/sdxl-vae/vae/sdxl",
    torch_dtype=torch.float16
).to("cuda")

# Load and preprocess image
image = Image.open("input.png").convert("RGB")
transform = transforms.Compose([
    transforms.Resize((1024, 1024)),
    transforms.ToTensor(),
    transforms.Normalize([0.5], [0.5])
])
image_tensor = transform(image).unsqueeze(0).to("cuda", dtype=torch.float16)

# Encode to latent space
with torch.no_grad():
    latents = vae.encode(image_tensor).latent_dist.sample()
    latents = latents * vae.config.scaling_factor

# Decode back to image space
with torch.no_grad():
    latents = latents / vae.config.scaling_factor
    reconstructed = vae.decode(latents).sample

# Convert to PIL image
reconstructed = (reconstructed / 2 + 0.5).clamp(0, 1)
reconstructed = reconstructed.cpu().permute(0, 2, 3, 1).numpy()[0]
output_image = Image.fromarray((reconstructed * 255).astype("uint8"))
output_image.save("reconstructed.png")

ComfyUI Integration

# In ComfyUI, use the "Load VAE" node
# Point to: E:\huggingface\sdxl-vae\vae\sdxl\sdxl-vae.safetensors
# Connect to your SDXL model's VAE input

Model Specifications

Architecture

Type: Variational Autoencoder (VAE)
Architecture: SDXL AutoencoderKL
Latent Channels: 4
Latent Dimension: 8× compression (1024×1024 → 128×128 latent)
Training Batch Size: 256 (vs 9 in original SD VAE)
Weight Tracking: Exponential Moving Average (EMA)

Technical Details

Format: SafeTensors (secure, efficient)
Precision: FP32 (native), compatible with FP16
Input Resolution: 1024×1024 native, supports variable sizes
Latent Space: 4-channel continuous latent representation
Compression Ratio: 8× spatial compression per dimension

Training Data

Datasets: LAION-Aesthetics + LAION-Humans
Focus: High-quality face and human subject reconstruction
Evaluation: COCO 2017 validation (256×256 images)

Performance Metrics

Compared to original SD VAE, SDXL VAE achieves:

rFID: Lower (better reconstruction quality)
PSNR: Higher (better signal quality)
SSIM: Higher (better structural similarity)
PSIM: Lower (better perceptual quality)

Performance Tips

Optimization Strategies

Use FP16 for Speed

vae = AutoencoderKL.from_pretrained(
    "E:/huggingface/sdxl-vae/vae/sdxl",
    torch_dtype=torch.float16  # 2× faster, minimal quality loss
)

Enable Memory-Efficient Attention

pipe.enable_attention_slicing()
pipe.enable_vae_slicing()  # Process images in slices

Batch Processing

# Process multiple images efficiently
with torch.no_grad():
    latents = vae.encode(batch_images).latent_dist.sample()

Compile for Speed (PyTorch 2.0+)

vae = torch.compile(vae, mode="reduce-overhead")

Quality Improvements

Always use this VAE with SDXL models for best quality
Particularly improves fine details, textures, and faces
Most noticeable in high-resolution outputs (1024×1024+)
Reduces artifacts and improves color accuracy

Memory Management

VAE decoding is memory-intensive for large batches
Use enable_vae_slicing() for memory-constrained systems
Consider tiled VAE decoding for resolutions >1024×1024

License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Citation

If you use this VAE in your research or projects, please cite:

@misc{sdxl-vae-2023,
  title={SDXL: Improved Variational Autoencoder},
  author={Stability AI},
  year={2023},
  howpublished={\url{https://huggingface.co/stabilityai/sdxl-vae}},
}

@article{rombach2022high,
  title={High-resolution image synthesis with latent diffusion models},
  author={Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj{\"o}rn},
  journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={10684--10695},
  year={2022}
}

Resources

Official Repository: stabilityai/sdxl-vae
SDXL Base Model: stabilityai/stable-diffusion-xl-base-1.0
Diffusers Documentation: huggingface.co/docs/diffusers
Stable Diffusion Research: stability.ai/research

Technical Support

For issues specific to this VAE:

Version: v1.0 Last Updated: October 2025 Model Version: SDXL VAE 1.0 Maintained By: Local Hugging Face Repository

Downloads last month: -

Collection including wangkanai/sdxl-vae

sdxl

Collection

stable diffusion xl • 8 items • Updated 8 days ago • 1