wan2.2-sdxl-decoder / README.md
sugarquark's picture
Update README.md
16e73eb verified
|
raw
history blame
1.07 kB
---
license: apache-2.0
base_model:
- Wan-AI/Wan2.2-TI2V-5B-Diffusers
---
# SDXL latent to image
It takes the 4ch latent and decodes it with the [WanDecoder3d module](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B-Diffusers/tree/main/vae).
After a short warmup phase, the head of the WanDecoder3d became part of the process.
During the warmup, the model learned the color space. Later on, the imported/modified head improved the stability of the image.
```python
if __name__ == '__main__':
model = WanXL()
vae = AutoencoderKLWan.from_pretrained('Wan-AI/Wan2.2-TI2V-5B-Diffusers', subfolder='vae')
z = torch.randn(1, 4, 128, 128) # (B, C, H, W)
x = model(z) # (B, C, T, H, W)
image = transforms.functional.to_pil_image(model.decode_by(vae, x).squeeze())
```
The SDXL latent was generated by this [model](https://huggingface.co/Laxhar/noobai-XL-Vpred-1.0/tree/main/vae).
As shown in the example, the target image size is preferably 1024px due to the lossy compression of the original encoded data.
## Datasets
- 12TPICS
- jlbaker361/flickr_humans