File size: 9,485 Bytes
29dc7d4 ab9565b 29dc7d4 ab9565b 29dc7d4 ab9565b 29dc7d4 ab9565b dfd426c ab9565b dfd426c ab9565b dfd426c ab9565b 29dc7d4 ab9565b dfd426c ab9565b b22ea2f ab9565b dfd426c ab9565b dfd426c ab9565b 29dc7d4 ab9565b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 |
---
base_model: black-forest-labs/FLUX.1-dev
library_name: diffusers
license: other
inference: true
tags:
- flux
- flux-diffusers
- text-to-image
- diffusers
- control
- diffusers-training
---
<!-- This model card has been generated automatically according to the information the training script had access to. You
should probably proofread and complete it, then remove this comment. -->
# Flux Edit
These are the control weights trained on [black-forest-labs/FLUX.1-dev](htpss://hf.co/black-forest-labs/FLUX.1-dev)
and [TIGER-Lab/OmniEdit-Filtered-1.2M](https://huggingface.co/datasets/TIGER-Lab/OmniEdit-Filtered-1.2M) for image editing. We use the
[Flux Control framework](https://blackforestlabs.ai/flux-1-tools/) for fine-tuning.
## License
Please adhere to the licensing terms as described [here](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md)
## Intended uses & limitations
### Inference
```py
from diffusers import FluxControlPipeline, FluxTransformer2DModel
from diffusers.utils import load_image
import torch
path = "sayakpaul/FLUX.1-dev-edit-v0" # to change
edit_transformer = FluxTransformer2DModel.from_pretrained(path, torch_dtype=torch.bfloat16)
pipeline = FluxControlPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev", transformer=edit_transformer, torch_dtype=torch.bfloat16
).to("cuda")
image = load_image("./assets/mushroom.jpg") # resize as needed.
print(image.size)
prompt = "turn the color of mushroom to gray"
image = pipeline(
control_image=image,
prompt=prompt,
guidance_scale=30., # change this as needed.
num_inference_steps=50, # change this as needed.
max_sequence_length=512,
height=image.height,
width=image.width,
generator=torch.manual_seed(0)
).images[0]
image.save("edited_image.png")
```
### Speeding inference with a turbo LoRA
We can speed up the inference by reducing the `num_inference_steps` to produce a nice image by using turbo LoRA like [`ByteDance/Hyper-SD`](https://hf.co/ByteDance/Hyper-SD).
Make sure to install `peft` before running the code below: `pip install -U peft`.
<details>
<summary>Code</summary>
```py
from diffusers import FluxControlPipeline, FluxTransformer2DModel
from diffusers.utils import load_image
from huggingface_hub import hf_hub_download
import torch
path = "sayakpaul/FLUX.1-dev-edit-v0" # to change
edit_transformer = FluxTransformer2DModel.from_pretrained(path, torch_dtype=torch.bfloat16)
pipeline = FluxControlPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev", transformer=edit_transformer, torch_dtype=torch.bfloat16
).to("cuda")
# load the turbo LoRA
pipeline.load_lora_weights(
hf_hub_download("ByteDance/Hyper-SD", "Hyper-FLUX.1-dev-8steps-lora.safetensors"), adapter_name="hyper-sd"
)
pipeline.set_adapters(["hyper-sd"], adapter_weights=[0.125])
image = load_image("./assets/mushroom.jpg") # resize as needed.
print(image.size)
prompt = "turn the color of mushroom to gray"
image = pipeline(
control_image=image,
prompt=prompt,
guidance_scale=30., # change this as needed.
num_inference_steps=8, # change this as needed.
max_sequence_length=512,
height=image.height,
width=image.width,
generator=torch.manual_seed(0)
).images[0]
image.save("edited_image.png")
```
</details>
<br><br>
<details>
<summary>Comparison</summary>
<table align="center">
<tr>
<th>50 steps</th>
<th>8 steps</th>
</tr>
<tr>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_car.jpg" alt="50 steps 1" width="150"></td>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_8steps_car.jpg" alt="8 steps 1" width="150"></td>
</tr>
<tr>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_norte_dam.jpg" alt="50 steps 2" width="150"></td>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_8steps_norte_dam.jpg" alt="8 steps 2" width="150"></td>
</tr>
<tr>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_mushroom.jpg" alt="50 steps 3" width="150"></td>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_8steps_mushroom.jpg" alt="8 steps 3" width="150"></td>
</tr>
<tr>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_green_creature.jpg" alt="50 steps 4" width="150"></td>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_8steps_green_creature.jpg" alt="8 steps 4" width="150"></td>
</tr>
</table>
</details>
You can also choose to perform quantization if the memory requirements cannot be satisfied further w.r.t your hardware. Refer to the [Diffusers documentation](https://huggingface.co/docs/diffusers/main/en/quantization/overview) to learn more.
`guidance_scale` also impacts the results:
<table align="center">
<tr>
<th>Source Image</th>
<th>Edited Image (gs: 10)</th>
<th>Edited Image (gs: 20)</th>
<th>Edited Image (gs: 30)</th>
<th>Edited Image (gs: 40)</th>
</tr>
<tr>
<td align="center">
<img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/assets/car.jpg" alt="Source Image 1" width="150"><br>
<em>Give this the look of a traditional Japanese woodblock print.</em>
</td>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_gs-10_car.jpg" alt="Edited Image gs 10"></td>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_gs-20_car.jpg" alt="Edited Image gs 20"></td>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_gs-30_car.jpg" alt="Edited Image gs 30"></td>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_gs-40_car.jpg" alt="Edited Image gs 40"></td>
</tr>
<tr>
<td align="center">
<img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/assets/green_creature.jpg" alt="Source Image 2" width="150"><br>
<em>transform the setting to a winter scene</em>
</td>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_gs-10_green_creature.jpg" alt="Edited Image gs 10"></td>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_gs-20_green_creature.jpg" alt="Edited Image gs 20"></td>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_gs-30_green_creature.jpg" alt="Edited Image gs 30"></td>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_gs-40_green_creature.jpg" alt="Edited Image gs 40"></td>
</tr>
<tr>
<td align="center">
<img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/assets/mushroom.jpg" alt="Source Image 3" width="150"><br>
<em>turn the color of mushroom to gray</em>
</td>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_gs-10_mushroom.jpg" alt="Edited Image gs 10"></td>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_gs-20_mushroom.jpg" alt="Edited Image gs 20"></td>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_gs-30_mushroom.jpg" alt="Edited Image gs 30"></td>
<td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_gs-40_mushroom.jpg" alt="Edited Image gs 40"></td>
</tr>
</table>
### Limitations and bias
Expect the model to perform underwhelmingly as we don't know the exact training details of Flux Control.
## Training details
Fine-tuning codebase is [here](https://github.com/sayakpaul/flux-image-editing). Training hyperparameters:
* Per GPU batch size: 4
* Gradient accumulation steps: 4
* Guidance scale: 30
* BF16 mixed-precision
* AdamW optimizer (8bit from `bitsandbytes`)
* Constant learning rate of 5e-5
* Weight decay of 1e-6
* 20000 training steps
Training was conducted using a node of 8xH100s.
We used a simplified flow mechanism to perform the linear interpolation. In pseudo-code, that looks like:
```py
sigmas = torch.rand(batch_size)
timesteps = (sigmas * noise_scheduler.config.num_train_timesteps).long()
...
noisy_model_input = (1.0 - sigmas) * pixel_latents + sigmas * noise
```
where `pixel_latents` is computed from the source images and `noise` is drawn from a Gaussian distribution. For more details, check out
the repository. |