FLUX.1-dev-edit-v0 / README.md
sayakpaul's picture
sayakpaul HF Staff
Update README.md
b22ea2f verified
|
raw
history blame
9.49 kB
metadata
base_model: black-forest-labs/FLUX.1-dev
library_name: diffusers
license: other
inference: true
tags:
  - flux
  - flux-diffusers
  - text-to-image
  - diffusers
  - control
  - diffusers-training

Flux Edit

These are the control weights trained on black-forest-labs/FLUX.1-dev and TIGER-Lab/OmniEdit-Filtered-1.2M for image editing. We use the Flux Control framework for fine-tuning.

License

Please adhere to the licensing terms as described here

Intended uses & limitations

Inference

from diffusers import FluxControlPipeline, FluxTransformer2DModel
from diffusers.utils import load_image
import torch 

path = "sayakpaul/FLUX.1-dev-edit-v0" # to change
edit_transformer = FluxTransformer2DModel.from_pretrained(path, torch_dtype=torch.bfloat16)
pipeline = FluxControlPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=edit_transformer, torch_dtype=torch.bfloat16
).to("cuda")

image = load_image("./assets/mushroom.jpg") # resize as needed.
print(image.size)

prompt = "turn the color of mushroom to gray"
image = pipeline(
    control_image=image,
    prompt=prompt,
    guidance_scale=30., # change this as needed.
    num_inference_steps=50, # change this as needed.
    max_sequence_length=512,
    height=image.height,
    width=image.width,
    generator=torch.manual_seed(0)
).images[0]
image.save("edited_image.png")

Speeding inference with a turbo LoRA

We can speed up the inference by reducing the num_inference_steps to produce a nice image by using turbo LoRA like ByteDance/Hyper-SD.

Make sure to install peft before running the code below: pip install -U peft.

Code
from diffusers import FluxControlPipeline, FluxTransformer2DModel
from diffusers.utils import load_image
from huggingface_hub import hf_hub_download
import torch

path = "sayakpaul/FLUX.1-dev-edit-v0" # to change
edit_transformer = FluxTransformer2DModel.from_pretrained(path, torch_dtype=torch.bfloat16)
pipeline = FluxControlPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=edit_transformer, torch_dtype=torch.bfloat16
).to("cuda")

# load the turbo LoRA
pipeline.load_lora_weights(
    hf_hub_download("ByteDance/Hyper-SD", "Hyper-FLUX.1-dev-8steps-lora.safetensors"), adapter_name="hyper-sd"
)
pipeline.set_adapters(["hyper-sd"], adapter_weights=[0.125])

image = load_image("./assets/mushroom.jpg") # resize as needed.
print(image.size)

prompt = "turn the color of mushroom to gray"
image = pipeline(
    control_image=image,
    prompt=prompt,
    guidance_scale=30., # change this as needed.
    num_inference_steps=8, # change this as needed.
    max_sequence_length=512,
    height=image.height,
    width=image.width,
    generator=torch.manual_seed(0)
).images[0]
image.save("edited_image.png")


Comparison
50 steps 8 steps
50 steps 1 8 steps 1
50 steps 2 8 steps 2
50 steps 3 8 steps 3
50 steps 4 8 steps 4

You can also choose to perform quantization if the memory requirements cannot be satisfied further w.r.t your hardware. Refer to the Diffusers documentation to learn more.

guidance_scale also impacts the results:

Source Image Edited Image (gs: 10) Edited Image (gs: 20) Edited Image (gs: 30) Edited Image (gs: 40)
Source Image 1
Give this the look of a traditional Japanese woodblock print.
Edited Image gs 10 Edited Image gs 20 Edited Image gs 30 Edited Image gs 40
Source Image 2
transform the setting to a winter scene
Edited Image gs 10 Edited Image gs 20 Edited Image gs 30 Edited Image gs 40
Source Image 3
turn the color of mushroom to gray
Edited Image gs 10 Edited Image gs 20 Edited Image gs 30 Edited Image gs 40

Limitations and bias

Expect the model to perform underwhelmingly as we don't know the exact training details of Flux Control.

Training details

Fine-tuning codebase is here. Training hyperparameters:

  • Per GPU batch size: 4
  • Gradient accumulation steps: 4
  • Guidance scale: 30
  • BF16 mixed-precision
  • AdamW optimizer (8bit from bitsandbytes)
  • Constant learning rate of 5e-5
  • Weight decay of 1e-6
  • 20000 training steps

Training was conducted using a node of 8xH100s.

We used a simplified flow mechanism to perform the linear interpolation. In pseudo-code, that looks like:

sigmas = torch.rand(batch_size)
timesteps = (sigmas * noise_scheduler.config.num_train_timesteps).long()
...

noisy_model_input = (1.0 - sigmas) * pixel_latents + sigmas * noise

where pixel_latents is computed from the source images and noise is drawn from a Gaussian distribution. For more details, check out the repository.