Arrexel
/

pattern-diffusion

+---
+license: apache-2.0
+---
+![Example images](https://huggingface.co/Arrexel/pattern-diffusion/resolve/main/examples.png)
+# Model Description
+- Developed by: Alex Reid
+- Model type: Diffusion-based text-to-image generative model
+- License: Apache 2.0
+- Model Description: A foundational diffusion model trained entirely on tile-able (seamless) surface print patterns. Based on the architecture of stable-diffusion-2-base.
+# Overview
+A major weak point of state-of-the-art image generation models continues to be seamless (repeating/tile-able) images, particularly in cases where the image needs to appear completely flat and avoid depth, such as for product surfaces, textile printing, wallpaper and more. To overcome this, Pattern Diffusion was trained from scratch on approximately 6.8m tile-able patterns.
+Compared to a full-scale diffusion model such as SDXL or FLUX, unet diffusion models require significantly less data and compute to train when all images have repeating patterns/features. Pattern Diffusion was trained in under 1000 GPU-hours on 8xA100 at a batch size of 2048, for a total of 65,000 steps. Training was done in 4 stages, starting at 256x256 and increasing by 256px after each stage (256, 512, 768, 1024), until FID and CLIP scores stopped improving at each stage.
+Also available below is an example inference implementation that produces optimal results for tile-able image generation by combining both noise rolling and circular padding on conv2D layers.
+# Commercial Use
+Pattern Diffusion is released under Apache 2.0 license and is available for both research and commercial use with no attribution required.
+# Strong Areas
+- The model is excellent at generating floral and abstract patterns
+- Prompts can be a mix of very random concepts and often produce beautiful results
+- Fast inference speeds
+# Limitations
+- Cannot generate coherent text
+- Struggles with anatomically correct living creatures due to the limited size of the dataset, often producing incorrect numbers of limbs or mirrored bodies
+- Does work for simple geometric patterns (such as checkerboard) but frequently produces inconsistent geometry
+# Example Usage
+Below is an example script that produces the best scoring (CLIP and FID) results while having no visible seams in generated images. Most public techniques for making seamless images with diffusion models involves setting all conv2d layers to use circular padding, however upon testing the effects of this, it significantly harms FID and CLIP scores both on Pattern Diffusion and other models such as Stable Diffusion 1.5 and SDXL. This can be overcome by only enabling circular padding in the late steps of the diffusion process after the majority of the features have already been denoised. However when doing this, noise rolling must be implemented from the start of inference to ensure any prominent features are made seamless across the image border. With both noise rolling and late-stage circular conv2d padding, there is no measurable decrease in FID or CLIP scores from the unmodified inference setup.
+```python
+import torch
+from torch import Tensor
+import torch.nn as nn
+from torch.nn import Conv2d
+from torch.nn import functional as F
+from torch.nn.modules.utils import _pair
+from typing import Optional
+from diffusers import StableDiffusionPipeline, DDPMScheduler
+import diffusers
+from PIL import Image
+def asymmetricConv2DConvForward_circular(self, input: Tensor, weight: Tensor, bias: Optional[Tensor]):
+    self.paddingX = (
+        self._reversed_padding_repeated_twice[0],
+        self._reversed_padding_repeated_twice[1],
+        0,
+        0
+    )
+    self.paddingY = (
+        0,
+        0,
+        self._reversed_padding_repeated_twice[2],
+        self._reversed_padding_repeated_twice[3]
+    )
+    working = F.pad(input, self.paddingX, mode="circular")
+    working = F.pad(working, self.paddingY, mode="circular")
+    return F.conv2d(working, weight, bias, self.stride, _pair(0), self.dilation, self.groups)
+# Sets the padding mode to circular on Conv2d
+def make_seamless(model):
+    for module in model.modules():
+        if isinstance(module, torch.nn.Conv2d):
+            if isinstance(module, diffusers.models.lora.LoRACompatibleConv) and module.lora_layer is None:
+                module.lora_layer = lambda *x: 0
+            module._conv_forward = asymmetricConv2DConvForward_circular.__get__(module, Conv2d)
+# Sets the padding mode back to default on Conv2d
+def disable_seamless(model):
+    for module in model.modules():
+        if isinstance(module, torch.nn.Conv2d):
+            if isinstance(module, diffusers.models.lora.LoRACompatibleConv) and module.lora_layer is None:
+                module.lora_layer = lambda *x: 0
+            module._conv_forward = nn.Conv2d._conv_forward.__get__(module, Conv2d)
+# Runs every inference step
+def diffusion_callback(pipe, step_index, timestep, callback_kwargs):
+    # Sets unet and VAE to have circular padding on conv2d for last 20% of steps
+    if step_index == int(pipe.num_timesteps * 0.8):
+        make_seamless(pipe.unet)
+        make_seamless(pipe.vae)
+    # Noise Rolling: For the first 80% of steps, this shifts the noise slightly and wraps around the edge
+    if step_index < int(pipe.num_timesteps * 0.8):
+        callback_kwargs["latents"] = torch.roll(callback_kwargs["latents"], shifts=(64, 64), dims=(2, 3))
+    return callback_kwargs
+pipe = StableDiffusionPipeline.from_pretrained(
+    "Arrexel/pattern-diffusion",
+    torch_dtype=torch.float16
+).to("cuda")
+pipe.scheduler = DDPMScheduler.from_config(pipe.scheduler.config)
+# Make sure to disable circular padding on conv2d before starting inference as it should only be enabled in last 20% of steps
+# This is not necessary if you are only generating a single image (as it is disabled by default when the pipe loads)
+disable_seamless(pipe.unet)
+disable_seamless(pipe.vae)
+output = pipe(
+    num_inference_steps=50,
+    prompt="Vibrant watercolor floral pattern with pink, purple, and blue flowers against a white background.",
+    width=1024,
+    height=1024,
+    callback_on_step_end=diffusion_callback
+).images[0]
+output.save("example.png")
+```