|  | --- | 
					
						
						|  | license: other | 
					
						
						|  | license_name: flux-1-dev-non-commercial-license | 
					
						
						|  | license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md | 
					
						
						|  | tags: | 
					
						
						|  | - Text-to-Image | 
					
						
						|  | - ControlNet | 
					
						
						|  | - Diffusers | 
					
						
						|  | - Stable Diffusion | 
					
						
						|  | base_model: black-forest-labs/FLUX.1-dev | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | # Found some bugs, currently fixing them. Please do not download until the fixes are applied. | 
					
						
						|  |  | 
					
						
						|  | # FLUX.1-dev Controlnet | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | <img src="./images/image_union.png" width = "1000" /> | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | ## Diffusers version | 
					
						
						|  |  | 
					
						
						|  | until the next Diffusers pypi release, please install Diffusers from source and use [this PR](https://github.com/huggingface/diffusers/pull/9175) to be able to use. | 
					
						
						|  | TODO: change when new version. | 
					
						
						|  |  | 
					
						
						|  | ## Checkpoint | 
					
						
						|  |  | 
					
						
						|  | The training of union controlnet requires a significant amount of computational power. | 
					
						
						|  | The current release is only an alpha version checkpoint that has not been fully trained. | 
					
						
						|  | The beta version is in the training process. | 
					
						
						|  | We have conducted ablation studies that have demonstrated the validity of the code. | 
					
						
						|  | The open-source release of the alpha version is solely to facilitate the rapid growth of the open-source community and the Flux ecosystem; | 
					
						
						|  | it is common to encounter bad cases (please accept my apologies). | 
					
						
						|  | It is worth noting that we have found that even a fully trained Union model may not perform as well as specialized models, such as pose control. | 
					
						
						|  | However, as training progresses, the performance of the Union model will continue to approach that of specialized models. | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | ## Control Mode | 
					
						
						|  |  | 
					
						
						|  | | Control Mode | Description | Current Model Validity | | 
					
						
						|  | |:------------:|:-----------:|:-----------:| | 
					
						
						|  | |0|canny|🟢high| | 
					
						
						|  | |1|tile|🟢high| | 
					
						
						|  | |2|depth|🟢high| | 
					
						
						|  | |3|blur|🟢high| | 
					
						
						|  | |4|pose|🟢high| | 
					
						
						|  | |5|gray|🔴low| | 
					
						
						|  | |6|lq|🟢high| | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | # Demo | 
					
						
						|  | ```python | 
					
						
						|  | import torch | 
					
						
						|  | from diffusers.utils import load_image | 
					
						
						|  | from diffusers.pipelines.flux.pipeline_flux_controlnet import FluxControlNetPipeline | 
					
						
						|  | from diffusers.models.controlnet_flux import FluxControlNetModel | 
					
						
						|  |  | 
					
						
						|  | # load | 
					
						
						|  | base_model = 'black-forest-labs/FLUX.1-dev' | 
					
						
						|  | controlnet_model = 'InstantX/FLUX.1-dev-Controlnet-Union-alpha' | 
					
						
						|  | controlnet = FluxControlNetModel.from_pretrained(controlnet_model, torch_dtype=torch.bfloat16) | 
					
						
						|  | pipe = FluxControlNetPipeline.from_pretrained(base_model, controlnet=controlnet, torch_dtype=torch.bfloat16) | 
					
						
						|  | pipe.to("cuda") | 
					
						
						|  |  | 
					
						
						|  | # image cfg | 
					
						
						|  | width, height = 1024, 1024 | 
					
						
						|  | controlnet_conditioning_scale = 0.5 | 
					
						
						|  | seed = 6666 | 
					
						
						|  |  | 
					
						
						|  | # canny | 
					
						
						|  | control_image = load_image("https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Union-alpha/resolve/main/images/canny.jpg") | 
					
						
						|  | prompt = "A girl in city, 25 years old, cool, futuristic." | 
					
						
						|  | control_mode = 0 | 
					
						
						|  |  | 
					
						
						|  | # tile | 
					
						
						|  | control_image = load_image("https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Union-alpha/resolve/main/images/tile.jpg") | 
					
						
						|  | prompt = "A girl, 25 years old." | 
					
						
						|  | control_mode = 1 | 
					
						
						|  |  | 
					
						
						|  | # depth | 
					
						
						|  | control_image = load_image("https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Union-alpha/resolve/main/images/depth.jpg") | 
					
						
						|  | prompt = "A girl in city, 25 years old, cool, futuristic." | 
					
						
						|  | control_mode = 2 | 
					
						
						|  |  | 
					
						
						|  | # blur | 
					
						
						|  | control_image = load_image("https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Union-alpha/resolve/main/images/blur.jpg") | 
					
						
						|  | prompt = "A girl, 25 years old." | 
					
						
						|  | control_mode = 3 | 
					
						
						|  |  | 
					
						
						|  | # pose | 
					
						
						|  | control_image = load_image("https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Union-alpha/resolve/main/images/pose.jpg") | 
					
						
						|  | prompt = "A girl in city, 25 years old, cool, futuristic." | 
					
						
						|  | control_mode = 4 | 
					
						
						|  |  | 
					
						
						|  | # gray | 
					
						
						|  | control_image = load_image("https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Union-alpha/resolve/main/images/gray.jpg") | 
					
						
						|  | prompt = "A girl, 25 years old." | 
					
						
						|  | control_mode = 5 | 
					
						
						|  |  | 
					
						
						|  | # low quality | 
					
						
						|  | control_image = load_image("https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Union-alpha/resolve/main/images/lq.jpg") | 
					
						
						|  | prompt = "A girl in city" | 
					
						
						|  | control_mode = 6 | 
					
						
						|  |  | 
					
						
						|  | # go go go | 
					
						
						|  | image = pipe( | 
					
						
						|  | prompt, | 
					
						
						|  | control_image=control_image, | 
					
						
						|  | control_mode=control_mode, | 
					
						
						|  | width=width, | 
					
						
						|  | height=height, | 
					
						
						|  | controlnet_conditioning_scale=controlnet_conditioning_scale, | 
					
						
						|  | num_inference_steps=28, | 
					
						
						|  | guidance_scale=3.5, | 
					
						
						|  | generator=torch.manual_seed(seed), | 
					
						
						|  | ).images[0] | 
					
						
						|  | image.save("image.jpg") | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | # Acknowledgements | 
					
						
						|  |  | 
					
						
						|  | Thank you, [zzzzzero](https://github.com/zzzzzero), for pointing out the bug in the model. | 
					
						
						|  |  |