|
--- |
|
pipeline_tag: text-to-image |
|
inference: false |
|
license: other |
|
license_name: stabilityai-ai-community |
|
license_link: LICENSE.md |
|
tags: |
|
- tensorrt |
|
- sd3.5-large |
|
- text-to-image |
|
- depth |
|
- canny |
|
- blur |
|
- controlnet |
|
- onnx |
|
extra_gated_prompt: >- |
|
By clicking "Agree", you agree to the [License |
|
Agreement](https://huggingface.co/stabilityai/stable-diffusion-3.5-large/blob/main/LICENSE.md) |
|
and acknowledge Stability AI's [Privacy |
|
Policy](https://stability.ai/privacy-policy). |
|
extra_gated_fields: |
|
Name: text |
|
Email: text |
|
Country: country |
|
Organization or Affiliation: text |
|
Receive email updates and promotions on Stability AI products, services, and research?: |
|
type: select |
|
options: |
|
- 'Yes' |
|
- 'No' |
|
What do you intend to use the model for?: |
|
type: select |
|
options: |
|
- Research |
|
- Personal use |
|
- Creative Professional |
|
- Startup |
|
- Enterprise |
|
I agree to the License Agreement and acknowledge Stability AI's Privacy Policy: checkbox |
|
language: |
|
- en |
|
--- |
|
|
|
# Stable Diffusion 3.5 Large ControlNet TensorRT |
|
## Introduction |
|
|
|
This repository hosts the **TensorRT-optimized version** of **Stable Diffusion 3.5 Large ControlNets**, developed in collaboration between [Stability AI](https://stability.ai) and [NVIDIA](https://huggingface.co/nvidia). This implementation leverages NVIDIA's TensorRT deep learning inference library to deliver significant performance improvements while maintaining the exceptional image quality of the original model. |
|
|
|
Stable Diffusion 3.5 Large is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. The TensorRT optimization makes these capabilities accessible for production deployment and real-time applications. |
|
|
|
The following control types are available: |
|
|
|
- Canny - Use a Canny edge map to guide the structure of the generated image. This is especially useful for illustrations, but works with all styles. |
|
|
|
- Depth - use a depth map, generated by DepthFM, to guide generation. Some example use cases include generating architectural renderings, or texturing 3D assets. |
|
|
|
- Blur - can be used to perform extremely high fidelity upscaling. A common use case is to tile an input image, apply the ControlNet to each tile, and merge the tiles to produce a higher resolution image. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
This repository holds the ONNX export of the Depth, Canny and Blue ControlNet models in BF16 precision. |
|
|
|
|
|
## Performance using TensorRT 10.13 |
|
#### Depth ControlNet: Timings for 40 steps at 1024x1024 |
|
|
|
|
|
| Accelerator | Precision | VAE Encoder | CLIP-G | CLIP-L | T5 | MMDiT x 40 | VAE Decoder | Total | |
|
|-------------|-----------|-------------|------------|--------------|--------------|-----------------------|---------------------|------------------------| |
|
| H100 | BF16 | 74.97 ms | 11.87 ms | 4.90 ms | 8.82 ms | 18839.01 ms | 117.38 ms | 19097.19 ms | |
|
|
|
#### Canny ControlNet: Timings for 60 steps at 1024x1024 |
|
|
|
|
|
| Accelerator | Precision | VAE Encoder | CLIP-G | CLIP-L | T5 | MMDiT x 60 | VAE Decoder | Total | |
|
|-------------|-----------|-------------|------------|--------------|--------------|-----------------------|---------------------|------------------------| |
|
| H100 | BF16 | 78.50 ms | 12.29 ms | 5.08 ms | 8.65 ms | 28057.08 ms | 106.49 ms | 28306.20 ms | |
|
|
|
|
|
#### Blur ControlNet: Timings for 60 steps at 1024x1024 |
|
|
|
| Accelerator | Precision | VAE Encoder | CLIP-G | CLIP-L | T5 | MMDiT x 60 | VAE Decoder | Total | |
|
|-------------|-----------|-------------|------------|--------------|--------------|-----------------------|---------------------|------------------------| |
|
| H100 | BF16 | 74.48 ms | 11.71 ms | 4.86 ms | 8.80 ms | 28604.26 ms | 113.24 ms | 28859.06 ms | |
|
|
|
|
|
|
|
## Usage Example |
|
1. Follow the [setup instructions](https://github.com/NVIDIA/TensorRT/blob/release/sd35/demo/Diffusion/README.md) on launching a TensorRT NGC container. |
|
```shell |
|
git clone https://github.com/NVIDIA/TensorRT.git |
|
cd TensorRT |
|
git checkout release/sd35 |
|
docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:25.01-py3 /bin/bash |
|
``` |
|
|
|
|
|
2. Install libraries and requirements |
|
```shell |
|
cd demo/Diffusion |
|
python3 -m pip install --upgrade pip |
|
pip3 install -r requirements.txt |
|
python3 -m pip install --pre --upgrade --extra-index-url https://pypi.nvidia.com tensorrt-cu12 |
|
``` |
|
|
|
3. Generate HuggingFace user access token |
|
To download model checkpoints for the Stable Diffusion 3.5 checkpoints, please request access on the[Stable Diffusion 3.5 Large](https://huggingface.co/stabilityai/stable-diffusion-3.5-large), [Stable Diffusion 3.5 Large Depth ControlNet](https://huggingface.co/stabilityai/stable-diffusion-3.5-large-controlnet-depth), [Stable Diffusion 3.5 Large Canny ControlNet](https://huggingface.co/stabilityai/stable-diffusion-3.5-large-controlnet-canny), and [Stable Diffusion 3.5 Large Blur ControlNet](https://huggingface.co/stabilityai/stable-diffusion-3.5-large-controlnet-blur) pages. |
|
You will then need to obtain a `read` access token to HuggingFace Hub and export as shown below. See [instructions](https://huggingface.co/docs/hub/security-tokens). |
|
|
|
```bash |
|
export HF_TOKEN=<your access token> |
|
``` |
|
|
|
4. Perform TensorRT optimized inference: |
|
|
|
- **Stable Diffusion 3.5 Large Depth ControlNet in BF16 precision** |
|
|
|
``` |
|
python3 demo_controlnet_sd35.py \ |
|
"a photo of a man" \ |
|
--version=3.5-large \ |
|
--bf16 \ |
|
--controlnet-type depth \ |
|
--download-onnx-models \ |
|
--denoising-steps=40 \ |
|
--guidance-scale 4.5 \ |
|
--build-static-batch \ |
|
--use-cuda-graph \ |
|
--hf-token=$HF_TOKEN |
|
``` |
|
|
|
- **Stable Diffusion 3.5 Large Canny ControlNet in BF16 precision** |
|
|
|
``` |
|
python3 demo_controlnet_sd35.py \ |
|
"A Night time photo taken by Leica M11, portrait of a Japanese woman in a kimono, looking at the camera, Cherry blossoms" \ |
|
--version=3.5-large \ |
|
--bf16 \ |
|
--controlnet-type canny \ |
|
--download-onnx-models \ |
|
--denoising-steps=60 \ |
|
--guidance-scale 3.5 \ |
|
--build-static-batch \ |
|
--use-cuda-graph \ |
|
--hf-token=$HF_TOKEN |
|
``` |
|
|
|
|
|
- **Stable Diffusion 3.5 Large Blur ControlNet in BF16 precision** |
|
|
|
``` |
|
python3 demo_controlnet_sd35.py \ |
|
"generated ai art, a tiny, lost rubber ducky in an action shot close-up, surfing the humongous waves, inside the tube, in the style of Kelly Slater" \ |
|
--version=3.5-large \ |
|
--bf16 \ |
|
--controlnet-type blur \ |
|
--download-onnx-models \ |
|
--denoising-steps=60 \ |
|
--guidance-scale 3.5 \ |
|
--build-static-batch \ |
|
--use-cuda-graph \ |
|
--hf-token=$HF_TOKEN |
|
``` |
|
|