spacepxl
/

Wan2.1-VAE-upscale2x

Model card Files Files and versions

About datasets

#1

by qpqpqpqpqpqp - opened 3 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ base_model:
 A decoder-only finetune of the Wan2.1 VAE, with 2x upscaling integrated directly into the decoder. The main purpose of this is to kill the dreaded wan speckles/polka dots/grain, but it's also convenient for highres fix workflows. The outputs of the 2x decoder are usually much better than what you would get by running the outputs of the original decoder through an image upscale model, and even better, it's effectively free, since the compute cost of decoding is virtually unchanged. If you don't want to use the extra resolution, a slight blur and downsample will give you an original resolution image with much higher quality than the original decoder can produce.
-In particular, this VAE improves skin details and hair very significantly. It is trained almost exclusively on real images, so it may struggle with anime/lineart and text. It would be possible to finetune on anime/lineart, but I'm not aware of a suitable dataset that's licensed correctly and not just full of scraped media with massive copyright violations. If you know of an appropriate datset for this that is sourced from cc-by materials or similar, let me know and I'd be happy to try training it.
 The first released version is trained on images only, and is compatible with both Wan and Qwen since they share the same latent space. A video version is planned, but video training is more complex than image training, so it will take some time.

 A decoder-only finetune of the Wan2.1 VAE, with 2x upscaling integrated directly into the decoder. The main purpose of this is to kill the dreaded wan speckles/polka dots/grain, but it's also convenient for highres fix workflows. The outputs of the 2x decoder are usually much better than what you would get by running the outputs of the original decoder through an image upscale model, and even better, it's effectively free, since the compute cost of decoding is virtually unchanged. If you don't want to use the extra resolution, a slight blur and downsample will give you an original resolution image with much higher quality than the original decoder can produce.
+In particular, this VAE improves skin details and hair very significantly. It is trained almost exclusively on real images, so it may struggle with anime/lineart and text. It would be possible to finetune on anime/lineart, but I'm not aware of a suitable dataset that's licensed correctly and not just full of scraped media with massive copyright violations. If you know of an appropriate dataset for this that is sourced from cc-by materials or similar, let me know and I'd be happy to try training it.
 The first released version is trained on images only, and is compatible with both Wan and Qwen since they share the same latent space. A video version is planned, but video training is more complex than image training, so it will take some time.