|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: |
|
|
- Wan-AI/Wan2.2-TI2V-5B-Diffusers |
|
|
--- |
|
|
|
|
|
# SDXL latent to image |
|
|
|
|
|
It takes the 4ch latent and decodes it with the [WanDecoder3d module](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B-Diffusers/tree/main/vae). |
|
|
|
|
|
After a short warmup phase, the head of the WanDecoder3d became part of the process. |
|
|
|
|
|
During the warmup, the model learned the color space. Later on, the imported/modified head improved the stability of the image. |
|
|
|
|
|
```python |
|
|
if __name__ == '__main__': |
|
|
model = WanXL() |
|
|
vae = AutoencoderKLWan.from_pretrained('Wan-AI/Wan2.2-TI2V-5B-Diffusers', subfolder='vae') |
|
|
z = torch.randn(1, 4, 128, 128) # (B, C, H, W) |
|
|
x = model(z) # (B, C, T, H, W) |
|
|
image = transforms.functional.to_pil_image(model.decode_by(vae, x).squeeze()) |
|
|
``` |
|
|
|
|
|
The SDXL latent was generated by this [model](https://huggingface.co/Laxhar/noobai-XL-Vpred-1.0/tree/main/vae). |
|
|
|
|
|
As shown in the example, the target image size is preferably 1024px due to the lossy compression of the original encoded data. |
|
|
|
|
|
## Datasets |
|
|
|
|
|
- 12TPICS |
|
|
- jlbaker361/flickr_humans |