pharmapsychotic
Add license
676115a
---
pipeline_tag: text-to-image
inference: false
license: other
license_name: stabilityai-ai-community
license_link: LICENSE.md
tags:
- tensorrt
- sd3.5-large
- text-to-image
- depth
- canny
- blur
- controlnet
- onnx
extra_gated_prompt: >-
By clicking "Agree", you agree to the [License
Agreement](https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md)
and acknowledge Stability AI's [Privacy
Policy](https://stability.ai/privacy-policy).
extra_gated_fields:
Name: text
Email: text
Country: country
Organization or Affiliation: text
Receive email updates and promotions on Stability AI products, services, and research?:
type: select
options:
- 'Yes'
- 'No'
What do you intend to use the model for?:
type: select
options:
- Research
- Personal use
- Creative Professional
- Startup
- Enterprise
I agree to the License Agreement and acknowledge Stability AI's Privacy Policy: checkbox
language:
- en
---
# Stable Diffusion 3.5 Large ControlNet TensorRT
## Introduction
This repository hosts the **TensorRT-optimized version** of **Stable Diffusion 3.5 Large ControlNets**, developed in collaboration between [Stability AI](https://stability.ai) and [NVIDIA](https://huggingface.co/nvidia). This implementation leverages NVIDIA's TensorRT deep learning inference library to deliver significant performance improvements while maintaining the exceptional image quality of the original model.
Stable Diffusion 3.5 Large is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. The TensorRT optimization makes these capabilities accessible for production deployment and real-time applications.
The following control types are available:
- Canny - Use a Canny edge map to guide the structure of the generated image. This is especially useful for illustrations, but works with all styles.
- Depth - use a depth map, generated by DepthFM, to guide generation. Some example use cases include generating architectural renderings, or texturing 3D assets.
- Blur - can be used to perform extremely high fidelity upscaling. A common use case is to tile an input image, apply the ControlNet to each tile, and merge the tiles to produce a higher resolution image.
## Model Details
### Model Description
This repository holds the ONNX export of the Depth, Canny and Blue ControlNet models in BF16 precision.
## Performance using TensorRT 10.13
#### Depth ControlNet: Timings for 40 steps at 1024x1024
| Accelerator | Precision | VAE Encoder | CLIP-G | CLIP-L | T5 | MMDiT x 40 | VAE Decoder | Total |
|-------------|-----------|-------------|------------|--------------|--------------|-----------------------|---------------------|------------------------|
| H100 | BF16 | 74.97 ms | 11.87 ms | 4.90 ms | 8.82 ms | 18839.01 ms | 117.38 ms | 19097.19 ms |
#### Canny ControlNet: Timings for 60 steps at 1024x1024
| Accelerator | Precision | VAE Encoder | CLIP-G | CLIP-L | T5 | MMDiT x 60 | VAE Decoder | Total |
|-------------|-----------|-------------|------------|--------------|--------------|-----------------------|---------------------|------------------------|
| H100 | BF16 | 78.50 ms | 12.29 ms | 5.08 ms | 8.65 ms | 28057.08 ms | 106.49 ms | 28306.20 ms |
#### Blur ControlNet: Timings for 60 steps at 1024x1024
| Accelerator | Precision | VAE Encoder | CLIP-G | CLIP-L | T5 | MMDiT x 60 | VAE Decoder | Total |
|-------------|-----------|-------------|------------|--------------|--------------|-----------------------|---------------------|------------------------|
| H100 | BF16 | 74.48 ms | 11.71 ms | 4.86 ms | 8.80 ms | 28604.26 ms | 113.24 ms | 28859.06 ms |
## Usage Example
1. Follow the [setup instructions](https://github.com/NVIDIA/TensorRT/blob/release/sd35/demo/Diffusion/README.md) on launching a TensorRT NGC container.
```shell
git clone https://github.com/NVIDIA/TensorRT.git
cd TensorRT
git checkout release/sd35
docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:25.01-py3 /bin/bash
```
2. Install libraries and requirements
```shell
cd demo/Diffusion
python3 -m pip install --upgrade pip
pip3 install -r requirements.txt
python3 -m pip install --pre --upgrade --extra-index-url https://pypi.nvidia.com tensorrt-cu12
```
3. Generate HuggingFace user access token
To download model checkpoints for the Stable Diffusion 3.5 checkpoints, please request access on the[Stable Diffusion 3.5 Large](https://huggingface.co/stabilityai/stable-diffusion-3.5-large), [Stable Diffusion 3.5 Large Depth ControlNet](https://huggingface.co/stabilityai/stable-diffusion-3.5-large-controlnet-depth), [Stable Diffusion 3.5 Large Canny ControlNet](https://huggingface.co/stabilityai/stable-diffusion-3.5-large-controlnet-canny), and [Stable Diffusion 3.5 Large Blur ControlNet](https://huggingface.co/stabilityai/stable-diffusion-3.5-large-controlnet-blur) pages.
You will then need to obtain a `read` access token to HuggingFace Hub and export as shown below. See [instructions](https://huggingface.co/docs/hub/security-tokens).
```bash
export HF_TOKEN=<your access token>
```
4. Perform TensorRT optimized inference:
- **Stable Diffusion 3.5 Large Depth ControlNet in BF16 precision**
```
python3 demo_controlnet_sd35.py \
"a photo of a man" \
--version=3.5-large \
--bf16 \
--controlnet-type depth \
--download-onnx-models \
--denoising-steps=40 \
--guidance-scale 4.5 \
--build-static-batch \
--use-cuda-graph \
--hf-token=$HF_TOKEN
```
- **Stable Diffusion 3.5 Large Canny ControlNet in BF16 precision**
```
python3 demo_controlnet_sd35.py \
"A Night time photo taken by Leica M11, portrait of a Japanese woman in a kimono, looking at the camera, Cherry blossoms" \
--version=3.5-large \
--bf16 \
--controlnet-type canny \
--download-onnx-models \
--denoising-steps=60 \
--guidance-scale 3.5 \
--build-static-batch \
--use-cuda-graph \
--hf-token=$HF_TOKEN
```
- **Stable Diffusion 3.5 Large Blur ControlNet in BF16 precision**
```
python3 demo_controlnet_sd35.py \
"generated ai art, a tiny, lost rubber ducky in an action shot close-up, surfing the humongous waves, inside the tube, in the style of Kelly Slater" \
--version=3.5-large \
--bf16 \
--controlnet-type blur \
--download-onnx-models \
--denoising-steps=60 \
--guidance-scale 3.5 \
--build-static-batch \
--use-cuda-graph \
--hf-token=$HF_TOKEN
```