Arrexel commited on
Commit
62852b0
·
verified ·
1 Parent(s): 09bade2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +124 -3
README.md CHANGED
@@ -1,3 +1,124 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ ![Example images](https://huggingface.co/Arrexel/pattern-diffusion/resolve/main/examples.png)
6
+
7
+
8
+ # Model Description
9
+ - Developed by: Alex Reid
10
+ - Model type: Diffusion-based text-to-image generative model
11
+ - License: Apache 2.0
12
+ - Model Description: A foundational diffusion model trained entirely on tile-able (seamless) surface print patterns. Based on the architecture of stable-diffusion-2-base.
13
+
14
+
15
+ # Overview
16
+ A major weak point of state-of-the-art image generation models continues to be seamless (repeating/tile-able) images, particularly in cases where the image needs to appear completely flat and avoid depth, such as for product surfaces, textile printing, wallpaper and more. To overcome this, Pattern Diffusion was trained from scratch on approximately 6.8m tile-able patterns.
17
+
18
+ Compared to a full-scale diffusion model such as SDXL or FLUX, unet diffusion models require significantly less data and compute to train when all images have repeating patterns/features. Pattern Diffusion was trained in under 1000 GPU-hours on 8xA100 at a batch size of 2048, for a total of 65,000 steps. Training was done in 4 stages, starting at 256x256 and increasing by 256px after each stage (256, 512, 768, 1024), until FID and CLIP scores stopped improving at each stage.
19
+
20
+ Also available below is an example inference implementation that produces optimal results for tile-able image generation by combining both noise rolling and circular padding on conv2D layers.
21
+
22
+ # Commercial Use
23
+ Pattern Diffusion is released under Apache 2.0 license and is available for both research and commercial use with no attribution required.
24
+
25
+
26
+ # Strong Areas
27
+ - The model is excellent at generating floral and abstract patterns
28
+ - Prompts can be a mix of very random concepts and often produce beautiful results
29
+ - Fast inference speeds
30
+
31
+
32
+ # Limitations
33
+ - Cannot generate coherent text
34
+ - Struggles with anatomically correct living creatures due to the limited size of the dataset, often producing incorrect numbers of limbs or mirrored bodies
35
+ - Does work for simple geometric patterns (such as checkerboard) but frequently produces inconsistent geometry
36
+
37
+
38
+ # Example Usage
39
+ Below is an example script that produces the best scoring (CLIP and FID) results while having no visible seams in generated images. Most public techniques for making seamless images with diffusion models involves setting all conv2d layers to use circular padding, however upon testing the effects of this, it significantly harms FID and CLIP scores both on Pattern Diffusion and other models such as Stable Diffusion 1.5 and SDXL. This can be overcome by only enabling circular padding in the late steps of the diffusion process after the majority of the features have already been denoised. However when doing this, noise rolling must be implemented from the start of inference to ensure any prominent features are made seamless across the image border. With both noise rolling and late-stage circular conv2d padding, there is no measurable decrease in FID or CLIP scores from the unmodified inference setup.
40
+
41
+ ```python
42
+ import torch
43
+ from torch import Tensor
44
+ import torch.nn as nn
45
+ from torch.nn import Conv2d
46
+ from torch.nn import functional as F
47
+ from torch.nn.modules.utils import _pair
48
+ from typing import Optional
49
+ from diffusers import StableDiffusionPipeline, DDPMScheduler
50
+ import diffusers
51
+ from PIL import Image
52
+
53
+
54
+ def asymmetricConv2DConvForward_circular(self, input: Tensor, weight: Tensor, bias: Optional[Tensor]):
55
+ self.paddingX = (
56
+ self._reversed_padding_repeated_twice[0],
57
+ self._reversed_padding_repeated_twice[1],
58
+ 0,
59
+ 0
60
+ )
61
+
62
+ self.paddingY = (
63
+ 0,
64
+ 0,
65
+ self._reversed_padding_repeated_twice[2],
66
+ self._reversed_padding_repeated_twice[3]
67
+ )
68
+ working = F.pad(input, self.paddingX, mode="circular")
69
+ working = F.pad(working, self.paddingY, mode="circular")
70
+
71
+ return F.conv2d(working, weight, bias, self.stride, _pair(0), self.dilation, self.groups)
72
+
73
+
74
+ # Sets the padding mode to circular on Conv2d
75
+ def make_seamless(model):
76
+ for module in model.modules():
77
+ if isinstance(module, torch.nn.Conv2d):
78
+ if isinstance(module, diffusers.models.lora.LoRACompatibleConv) and module.lora_layer is None:
79
+ module.lora_layer = lambda *x: 0
80
+ module._conv_forward = asymmetricConv2DConvForward_circular.__get__(module, Conv2d)
81
+
82
+
83
+ # Sets the padding mode back to default on Conv2d
84
+ def disable_seamless(model):
85
+ for module in model.modules():
86
+ if isinstance(module, torch.nn.Conv2d):
87
+ if isinstance(module, diffusers.models.lora.LoRACompatibleConv) and module.lora_layer is None:
88
+ module.lora_layer = lambda *x: 0
89
+ module._conv_forward = nn.Conv2d._conv_forward.__get__(module, Conv2d)
90
+
91
+
92
+ # Runs every inference step
93
+ def diffusion_callback(pipe, step_index, timestep, callback_kwargs):
94
+ # Sets unet and VAE to have circular padding on conv2d for last 20% of steps
95
+ if step_index == int(pipe.num_timesteps * 0.8):
96
+ make_seamless(pipe.unet)
97
+ make_seamless(pipe.vae)
98
+
99
+ # Noise Rolling: For the first 80% of steps, this shifts the noise slightly and wraps around the edge
100
+ if step_index < int(pipe.num_timesteps * 0.8):
101
+ callback_kwargs["latents"] = torch.roll(callback_kwargs["latents"], shifts=(64, 64), dims=(2, 3))
102
+
103
+ return callback_kwargs
104
+
105
+
106
+ pipe = StableDiffusionPipeline.from_pretrained(
107
+ "Arrexel/pattern-diffusion",
108
+ torch_dtype=torch.float16
109
+ ).to("cuda")
110
+ pipe.scheduler = DDPMScheduler.from_config(pipe.scheduler.config)
111
+
112
+ # Make sure to disable circular padding on conv2d before starting inference as it should only be enabled in last 20% of steps
113
+ # This is not necessary if you are only generating a single image (as it is disabled by default when the pipe loads)
114
+ disable_seamless(pipe.unet)
115
+ disable_seamless(pipe.vae)
116
+ output = pipe(
117
+ num_inference_steps=50,
118
+ prompt="Vibrant watercolor floral pattern with pink, purple, and blue flowers against a white background.",
119
+ width=1024,
120
+ height=1024,
121
+ callback_on_step_end=diffusion_callback
122
+ ).images[0]
123
+ output.save("example.png")
124
+ ```