pharmapsychotic

Add license

676115a 17 days ago

7.09 kB

	---
	pipeline_tag: text-to-image
	inference: false
	license: other
	license_name: stabilityai-ai-community
	license_link: LICENSE.md
	tags:
	- tensorrt
	- sd3.5-large
	- text-to-image
	- depth
	- canny
	- blur
	- controlnet
	- onnx
	extra_gated_prompt: >-
	By clicking "Agree", you agree to the [License
	Agreement](https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md)
	and acknowledge Stability AI's [Privacy
	Policy](https://stability.ai/privacy-policy).
	extra_gated_fields:
	Name: text
	Email: text
	Country: country
	Organization or Affiliation: text
	Receive email updates and promotions on Stability AI products, services, and research?:
	type: select
	options:
	- 'Yes'
	- 'No'
	What do you intend to use the model for?:
	type: select
	options:
	- Research
	- Personal use
	- Creative Professional
	- Startup
	- Enterprise
	I agree to the License Agreement and acknowledge Stability AI's Privacy Policy: checkbox
	language:
	- en
	---

	# Stable Diffusion 3.5 Large ControlNet TensorRT
	## Introduction

	This repository hosts the TensorRT-optimized version of Stable Diffusion 3.5 Large ControlNets, developed in collaboration between [Stability AI](https://stability.ai) and [NVIDIA](https://huggingface.co/nvidia). This implementation leverages NVIDIA's TensorRT deep learning inference library to deliver significant performance improvements while maintaining the exceptional image quality of the original model.

	Stable Diffusion 3.5 Large is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. The TensorRT optimization makes these capabilities accessible for production deployment and real-time applications.

	The following control types are available:

	- Canny - Use a Canny edge map to guide the structure of the generated image. This is especially useful for illustrations, but works with all styles.

	- Depth - use a depth map, generated by DepthFM, to guide generation. Some example use cases include generating architectural renderings, or texturing 3D assets.

	- Blur - can be used to perform extremely high fidelity upscaling. A common use case is to tile an input image, apply the ControlNet to each tile, and merge the tiles to produce a higher resolution image.

	## Model Details

	### Model Description
	This repository holds the ONNX export of the Depth, Canny and Blue ControlNet models in BF16 precision.


	## Performance using TensorRT 10.13
	#### Depth ControlNet: Timings for 40 steps at 1024x1024


	\| Accelerator \| Precision \| VAE Encoder \| CLIP-G \| CLIP-L \| T5 \| MMDiT x 40 \| VAE Decoder \| Total \|
	\|-------------\|-----------\|-------------\|------------\|--------------\|--------------\|-----------------------\|---------------------\|------------------------\|
	\| H100 \| BF16 \| 74.97 ms \| 11.87 ms \| 4.90 ms \| 8.82 ms \| 18839.01 ms \| 117.38 ms \| 19097.19 ms \|

	#### Canny ControlNet: Timings for 60 steps at 1024x1024


	\| Accelerator \| Precision \| VAE Encoder \| CLIP-G \| CLIP-L \| T5 \| MMDiT x 60 \| VAE Decoder \| Total \|
	\|-------------\|-----------\|-------------\|------------\|--------------\|--------------\|-----------------------\|---------------------\|------------------------\|
	\| H100 \| BF16 \| 78.50 ms \| 12.29 ms \| 5.08 ms \| 8.65 ms \| 28057.08 ms \| 106.49 ms \| 28306.20 ms \|


	#### Blur ControlNet: Timings for 60 steps at 1024x1024

	\| Accelerator \| Precision \| VAE Encoder \| CLIP-G \| CLIP-L \| T5 \| MMDiT x 60 \| VAE Decoder \| Total \|
	\|-------------\|-----------\|-------------\|------------\|--------------\|--------------\|-----------------------\|---------------------\|------------------------\|
	\| H100 \| BF16 \| 74.48 ms \| 11.71 ms \| 4.86 ms \| 8.80 ms \| 28604.26 ms \| 113.24 ms \| 28859.06 ms \|



	## Usage Example
	1. Follow the [setup instructions](https://github.com/NVIDIA/TensorRT/blob/release/sd35/demo/Diffusion/README.md) on launching a TensorRT NGC container.
	```shell
	git clone https://github.com/NVIDIA/TensorRT.git
	cd TensorRT
	git checkout release/sd35
	docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:25.01-py3 /bin/bash
	```


	2. Install libraries and requirements
	```shell
	cd demo/Diffusion
	python3 -m pip install --upgrade pip
	pip3 install -r requirements.txt
	python3 -m pip install --pre --upgrade --extra-index-url https://pypi.nvidia.com tensorrt-cu12
	```

	3. Generate HuggingFace user access token
	To download model checkpoints for the Stable Diffusion 3.5 checkpoints, please request access on the[Stable Diffusion 3.5 Large](https://huggingface.co/stabilityai/stable-diffusion-3.5-large), [Stable Diffusion 3.5 Large Depth ControlNet](https://huggingface.co/stabilityai/stable-diffusion-3.5-large-controlnet-depth), [Stable Diffusion 3.5 Large Canny ControlNet](https://huggingface.co/stabilityai/stable-diffusion-3.5-large-controlnet-canny), and [Stable Diffusion 3.5 Large Blur ControlNet](https://huggingface.co/stabilityai/stable-diffusion-3.5-large-controlnet-blur) pages.
	You will then need to obtain a `read` access token to HuggingFace Hub and export as shown below. See [instructions](https://huggingface.co/docs/hub/security-tokens).

	```bash
	export HF_TOKEN=<your access token>
	```

	4. Perform TensorRT optimized inference:

	- Stable Diffusion 3.5 Large Depth ControlNet in BF16 precision

	```
	python3 demo_controlnet_sd35.py \
	"a photo of a man" \
	--version=3.5-large \
	--bf16 \
	--controlnet-type depth \
	--download-onnx-models \
	--denoising-steps=40 \
	--guidance-scale 4.5 \
	--build-static-batch \
	--use-cuda-graph \
	--hf-token=$HF_TOKEN
	```

	- Stable Diffusion 3.5 Large Canny ControlNet in BF16 precision

	```
	python3 demo_controlnet_sd35.py \
	"A Night time photo taken by Leica M11, portrait of a Japanese woman in a kimono, looking at the camera, Cherry blossoms" \
	--version=3.5-large \
	--bf16 \
	--controlnet-type canny \
	--download-onnx-models \
	--denoising-steps=60 \
	--guidance-scale 3.5 \
	--build-static-batch \
	--use-cuda-graph \
	--hf-token=$HF_TOKEN
	```


	- Stable Diffusion 3.5 Large Blur ControlNet in BF16 precision

	```
	python3 demo_controlnet_sd35.py \
	"generated ai art, a tiny, lost rubber ducky in an action shot close-up, surfing the humongous waves, inside the tube, in the style of Kelly Slater" \
	--version=3.5-large \
	--bf16 \
	--controlnet-type blur \
	--download-onnx-models \
	--denoising-steps=60 \
	--guidance-scale 3.5 \
	--build-static-batch \
	--use-cuda-graph \
	--hf-token=$HF_TOKEN
	```