File size: 19,115 Bytes

---
license: apache-2.0
---

# Repos
https://github.com/mit-han-lab/deepcompressor

# Installation
https://github.com/mit-han-lab/deepcompressor/issues/56

https://github.com/nunchaku-tech/deepcompressor/issues/80

# Windows
https://learn.microsoft.com/en-us/windows/wsl/install

https://www.anaconda.com/docs/getting-started/miniconda/install

# Environment
python 3.12

cuda 12.8

torch 2.7

diffusers https://github.com/huggingface/diffusers

transformers 4.51

# Calibration

https://github.com/nunchaku-tech/deepcompressor/blob/main/examples/diffusion/README.md#step-2-calibration-dataset-preparation

# Quantization

https://github.com/nunchaku-tech/deepcompressor/blob/main/examples/diffusion/README.md#step-3-model-quantization

Model Path: https://github.com/nunchaku-tech/deepcompressor/issues/70#issuecomment-2788155233

Save model: `--save-model true` or `--save-model /PATH/TO/CHECKPOINT/DIR`

Example: `python -m deepcompressor.app.diffusion.ptq examples/diffusion/configs/model/flux.1-dev.yaml examples/diffusion/configs/svdquant/nvfp4.yaml`

Folder Structure 

- refer [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main)

- refer [black-forest-labs/FLUX.1-Kontext-dev](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev/tree/main)

---

# Blockers
1) NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

Potential fix: app.diffusion.pipeline.config.py
```python
    @staticmethod
    def _default_build(
        name: str, path: str, dtype: str | torch.dtype, device: str | torch.device, shift_activations: bool
    ) -> DiffusionPipeline:
        if not path:
            if name == "sdxl":
                path = "stabilityai/stable-diffusion-xl-base-1.0"
            elif name == "sdxl-turbo":
                path = "stabilityai/sdxl-turbo"
            elif name == "pixart-sigma":
                path = "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS"
            elif name == "flux.1-kontext-dev":
                path = "black-forest-labs/FLUX.1-Kontext-dev"
            elif name == "flux.1-dev":
                path = "black-forest-labs/FLUX.1-dev"
            elif name == "flux.1-canny-dev":
                path = "black-forest-labs/FLUX.1-Canny-dev"
            elif name == "flux.1-depth-dev":
                path = "black-forest-labs/FLUX.1-Depth-dev"
            elif name == "flux.1-fill-dev":
                path = "black-forest-labs/FLUX.1-Fill-dev"
            elif name == "flux.1-schnell":
                path = "black-forest-labs/FLUX.1-schnell"
            else:
                raise ValueError(f"Path for {name} is not specified.")
        if name in ["flux.1-kontext-dev"]:
            pipeline = FluxKontextPipeline.from_pretrained(path, torch_dtype=dtype)
        elif name in ["flux.1-canny-dev", "flux.1-depth-dev"]:
            pipeline = FluxControlPipeline.from_pretrained(path, torch_dtype=dtype)
        elif name == "flux.1-fill-dev":
            pipeline = FluxFillPipeline.from_pretrained(path, torch_dtype=dtype)
        elif name.startswith("sana-"):
            if dtype == torch.bfloat16:
                pipeline = SanaPipeline.from_pretrained(path, variant="bf16", torch_dtype=dtype, use_safetensors=True)
                pipeline.vae.to(dtype)
                pipeline.text_encoder.to(dtype)
            else:
                pipeline = SanaPipeline.from_pretrained(path, torch_dtype=dtype)
        else:
            pipeline = AutoPipelineForText2Image.from_pretrained(path, torch_dtype=dtype)

        # Debug output
        print(">>> DEVICE:", device)
        print(">>> PIPELINE TYPE:", type(pipeline))
    
        # Try to move each component using .to_empty()
        for name in ["unet", "transformer", "vae", "text_encoder"]:
            module = getattr(pipeline, name, None)
            if isinstance(module, torch.nn.Module):
                try:
                    print(f">>> Moving {name} to {device} using to_empty()")
                    module.to_empty(device)
                except Exception as e:
                    print(f">>> WARNING: {name}.to_empty({device}) failed: {e}")
                    try:
                        print(f">>> Falling back to {name}.to({device})")
                        module.to(device)
                    except Exception as ee:
                        print(f">>> ERROR: {name}.to({device}) also failed: {ee}")
    
        # Identify main model (for patching)
        model = getattr(pipeline, "unet", None) or getattr(pipeline, "transformer", None)
        if model is not None:
            replace_fused_linear_with_concat_linear(model)
            replace_up_block_conv_with_concat_conv(model)
            if shift_activations:
                shift_input_activations(model)
        else:
            print(">>> WARNING: No model (unet/transformer) found for patching")
    
        return pipeline
```

Debug Log
```
25-07-22 20:11:56 | I | === Start Evaluating ===
25-07-22 20:11:56 | I | * Building diffusion model pipeline
Loading pipeline components...:   0%|                                                             | 0/7 [00:00<?, ?it/s]
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 18.92it/s]
Loading pipeline components...: 100%|█████████████████████████████████████████████████████| 7/7 [00:00<00:00,  9.50it/s]
>>> DEVICE: cuda
>>> PIPELINE TYPE: <class 'diffusers.pipelines.flux.pipeline_flux_kontext.FluxKontextPipeline'>
>>> Moving transformer to cuda using to_empty()
>>> WARNING: transformer.to_empty(cuda) failed: Module.to_empty() takes 1 positional argument but 2 were given
>>> Falling back to transformer.to(cuda)
>>> ERROR: transformer.to(cuda) also failed: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.
>>> Moving vae to cuda using to_empty()
>>> WARNING: vae.to_empty(cuda) failed: Module.to_empty() takes 1 positional argument but 2 were given
>>> Falling back to vae.to(cuda)
>>> Moving text_encoder to cuda using to_empty()
>>> WARNING: text_encoder.to_empty(cuda) failed: Module.to_empty() takes 1 positional argument but 2 were given
>>> Falling back to text_encoder.to(cuda)
25-07-22 20:11:59 | I |   Replacing fused Linear with ConcatLinear.
25-07-22 20:11:59 | I |     + Replacing fused Linear in single_transformer_blocks.0 with ConcatLinear.
25-07-22 20:11:59 | I |       - in_features = 3072/15360
25-07-22 20:11:59 | I |       - out_features = 3072
25-07-22 20:11:59 | I |     + Replacing fused Linear in single_transformer_blocks.1 with ConcatLinear.
25-07-22 20:11:59 | I |       - in_features = 3072/15360
25-07-22 20:11:59 | I |       - out_features = 3072
25-07-22 20:11:59 | I |     + Replacing fused Linear in single_transformer_blocks.2 with ConcatLinear.
25-07-22 20:11:59 | I |       - in_features = 3072/15360
25-07-22 20:11:59 | I |       - out_features = 3072
```

2) KeyError: <class 'diffusers.models.transformers.transformer_flux.FluxAttention'>

Potential fix: app.diffusion.nn.struct.py
```python
    @staticmethod
    def _default_construct(
        module: Attention,
        /,
        parent: tp.Optional["DiffusionTransformerBlockStruct"] = None,
        fname: str = "",
        rname: str = "",
        rkey: str = "",
        idx: int = 0,
        **kwargs,
    ) -> "DiffusionAttentionStruct":
        if isinstance(module, FluxAttention):  
            # FluxAttention has different attribute names than standard attention  
            with_rope = True  
            num_query_heads = module.heads  # FluxAttention uses 'heads', not 'num_heads'  
            num_key_value_heads = module.heads  # FLUX typically uses same for q/k/v  
              
            # FluxAttention doesn't have 'to_out', but may have other output projections  
            # Check what output projection attributes actually exist  
            o_proj = None  
            o_proj_rname = ""  
              
            # Try to find the correct output projection  
            if hasattr(module, 'to_out') and module.to_out is not None:  
                o_proj = module.to_out[0] if isinstance(module.to_out, (list, tuple)) else module.to_out  
                o_proj_rname = "to_out.0" if isinstance(module.to_out, (list, tuple)) else "to_out"  
            elif hasattr(module, 'to_add_out'):  
                o_proj = module.to_add_out  
                o_proj_rname = "to_add_out"  
              
            q_proj, k_proj, v_proj = module.to_q, module.to_k, module.to_v  
            q_proj_rname, k_proj_rname, v_proj_rname = "to_q", "to_k", "to_v"  
            q, k, v = module.to_q, module.to_k, module.to_v  
            q_rname, k_rname, v_rname = "to_q", "to_k", "to_v"  
              
            # Handle the add_* projections that FluxAttention has  
            add_q_proj = getattr(module, "add_q_proj", None)  
            add_k_proj = getattr(module, "add_k_proj", None)   
            add_v_proj = getattr(module, "add_v_proj", None)  
            add_o_proj = getattr(module, "to_add_out", None)  
            add_q_proj_rname = "add_q_proj" if add_q_proj else ""  
            add_k_proj_rname = "add_k_proj" if add_k_proj else ""  
            add_v_proj_rname = "add_v_proj" if add_v_proj else ""  
            add_o_proj_rname = "to_add_out" if add_o_proj else ""  
              
            kwargs = (  
                "encoder_hidden_states",  
                "attention_mask",   
                "image_rotary_emb",  
            )  
            cross_attention = add_k_proj is not None
        elif module.is_cross_attention:
            q_proj, k_proj, v_proj = module.to_q, None, None
            add_q_proj, add_k_proj, add_v_proj, add_o_proj = None, module.to_k, module.to_v, None
            q_proj_rname, k_proj_rname, v_proj_rname = "to_q", "", ""
            add_q_proj_rname, add_k_proj_rname, add_v_proj_rname, add_o_proj_rname = "", "to_k", "to_v", ""
        else:
            q_proj, k_proj, v_proj = module.to_q, module.to_k, module.to_v
            add_q_proj = getattr(module, "add_q_proj", None)
            add_k_proj = getattr(module, "add_k_proj", None)
            add_v_proj = getattr(module, "add_v_proj", None)
            add_o_proj = getattr(module, "to_add_out", None)
            q_proj_rname, k_proj_rname, v_proj_rname = "to_q", "to_k", "to_v"
            add_q_proj_rname, add_k_proj_rname, add_v_proj_rname = "add_q_proj", "add_k_proj", "add_v_proj"
            add_o_proj_rname = "to_add_out"
        if getattr(module, "to_out", None) is not None:
            o_proj = module.to_out[0]
            o_proj_rname = "to_out.0"
            assert isinstance(o_proj, nn.Linear)
        elif parent is not None:
            assert isinstance(parent.module, FluxSingleTransformerBlock)
            assert isinstance(parent.module.proj_out, ConcatLinear)
            assert len(parent.module.proj_out.linears) == 2
            o_proj = parent.module.proj_out.linears[0]
            o_proj_rname = ".proj_out.linears.0"
        else:
            raise RuntimeError("Cannot find the output projection.")
        if isinstance(module.processor, DiffusionAttentionProcessor):
            with_rope = module.processor.rope is not None
        elif module.processor.__class__.__name__.startswith("Flux"):
            with_rope = True
        else:
            with_rope = False  # TODO: fix for other processors
        config = AttentionConfigStruct(
            hidden_size=q_proj.weight.shape[1],
            add_hidden_size=add_k_proj.weight.shape[1] if add_k_proj is not None else 0,
            inner_size=q_proj.weight.shape[0],
            num_query_heads=module.heads,
            num_key_value_heads=module.to_k.weight.shape[0] // (module.to_q.weight.shape[0] // module.heads),
            with_qk_norm=module.norm_q is not None,
            with_rope=with_rope,
            linear_attn=isinstance(module.processor, SanaLinearAttnProcessor2_0),
        )
        return DiffusionAttentionStruct(
            module=module,
            parent=parent,
            fname=fname,
            idx=idx,
            rname=rname,
            rkey=rkey,
            config=config,
            q_proj=q_proj,
            k_proj=k_proj,
            v_proj=v_proj,
            o_proj=o_proj,
            add_q_proj=add_q_proj,
            add_k_proj=add_k_proj,
            add_v_proj=add_v_proj,
            add_o_proj=add_o_proj,
            q=None,  # TODO: add q, k, v
            k=None,
            v=None,
            q_proj_rname=q_proj_rname,
            k_proj_rname=k_proj_rname,
            v_proj_rname=v_proj_rname,
            o_proj_rname=o_proj_rname,
            add_q_proj_rname=add_q_proj_rname,
            add_k_proj_rname=add_k_proj_rname,
            add_v_proj_rname=add_v_proj_rname,
            add_o_proj_rname=add_o_proj_rname,
            q_rname="",
            k_rname="",
            v_rname="",
        )
```

3) ValueError: Provide either `prompt` or `prompt_embeds`. Cannot leave both `prompt` and `prompt_embeds` undefined.

Potential Fix: app.diffusion.dataset.collect.calib.py

```python
def collect(config: DiffusionPtqRunConfig, dataset: datasets.Dataset):
    samples_dirpath = os.path.join(config.output.root, "samples")
    caches_dirpath = os.path.join(config.output.root, "caches")
    os.makedirs(samples_dirpath, exist_ok=True)
    os.makedirs(caches_dirpath, exist_ok=True)
    caches = []

    pipeline = config.pipeline.build()
    model = pipeline.unet if hasattr(pipeline, "unet") else pipeline.transformer
    assert isinstance(model, nn.Module)
    model.register_forward_hook(CollectHook(caches=caches), with_kwargs=True)

    batch_size = config.eval.batch_size
    print(f"In total {len(dataset)} samples")
    print(f"Evaluating with batch size {batch_size}")
    pipeline.set_progress_bar_config(desc="Sampling", leave=False, dynamic_ncols=True, position=1)
    for batch in tqdm(
        dataset.iter(batch_size=batch_size, drop_last_batch=False),
        desc="Data",
        leave=False,
        dynamic_ncols=True,
        total=(len(dataset) + batch_size - 1) // batch_size,
    ):
        filenames = batch["filename"]
        prompts = batch["prompt"]
        seeds = [hash_str_to_int(name) for name in filenames]
        generators = [torch.Generator(device=pipeline.device).manual_seed(seed) for seed in seeds]
        pipeline_kwargs = config.eval.get_pipeline_kwargs()

        task = config.pipeline.task
        control_root = config.eval.control_root
        if task in ["canny-to-image", "depth-to-image", "inpainting"]:
            controls = get_control(
                task,
                batch["image"],
                names=batch["filename"],
                data_root=os.path.join(
                    control_root, collect_config.dataset_name, f"{dataset.config_name}-{config.eval.num_samples}"
                ),
            )
            if task == "inpainting":
                pipeline_kwargs["image"] = controls[0]
                pipeline_kwargs["mask_image"] = controls[1]
            else:
                pipeline_kwargs["control_image"] = controls

        # Handle meta tensors by moving individual components  
        try:  
            pipeline = pipeline.to("cuda")  
        except NotImplementedError:  
            # Move individual pipeline components that have to_empty method  
            if hasattr(pipeline, 'transformer') and pipeline.transformer is not None:  
                try:  
                    pipeline.transformer = pipeline.transformer.to("cuda")  
                except NotImplementedError:  
                    pipeline.transformer = pipeline.transformer.to_empty(device="cuda")  

            if hasattr(pipeline, 'text_encoder') and pipeline.text_encoder is not None:  
                try:  
                    pipeline.text_encoder = pipeline.text_encoder.to("cuda")  
                except NotImplementedError:  
                    pipeline.text_encoder = pipeline.text_encoder.to_empty(device="cuda")  

            if hasattr(pipeline, 'text_encoder_2') and pipeline.text_encoder_2 is not None:  
                try:  
                    pipeline.text_encoder_2 = pipeline.text_encoder_2.to("cuda")  
                except NotImplementedError:  
                    pipeline.text_encoder_2 = pipeline.text_encoder_2.to_empty(device="cuda")  

            if hasattr(pipeline, 'vae') and pipeline.vae is not None:  
                try:  
                    pipeline.vae = pipeline.vae.to("cuda")  
                except NotImplementedError:  
                    pipeline.vae = pipeline.vae.to_empty(device="cuda")

        result_images = pipeline(prompt=prompts, generator=generators, **pipeline_kwargs).images
        num_guidances = (len(caches) // batch_size) // config.eval.num_steps
        num_steps = len(caches) // (batch_size * num_guidances)
        assert (
            len(caches) == batch_size * num_steps * num_guidances
        ), f"Unexpected number of caches: {len(caches)} != {batch_size} * {config.eval.num_steps} * {num_guidances}"
        for j, (filename, image) in enumerate(zip(filenames, result_images, strict=True)):
            image.save(os.path.join(samples_dirpath, f"{filename}.png"))
            for s in range(num_steps):
                for g in range(num_guidances):
                    c = caches[s * batch_size * num_guidances + g * batch_size + j]
                    c["filename"] = filename
                    c["step"] = s
                    c["guidance"] = g
                    c = tree_map(lambda x: process(x), c)
                    torch.save(c, os.path.join(caches_dirpath, f"{filename}-{s:05d}-{g}.pt"))
        caches.clear()
```

References

https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformers/transformer_flux.py#L266

https://github.com/nunchaku-tech/deepcompressor/blob/main/deepcompressor/nn/struct/attn.py

https://github.com/nunchaku-tech/nunchaku/blob/main/examples/flux.1-kontext-dev.py

https://github.com/nunchaku-tech/nunchaku/commit/b99fb8be615bc98c6915bbe06a1e0092cbc074a5

https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/flux/pipeline_flux_kontext.py

https://github.com/nunchaku-tech/deepcompressor/issues/91

---

# Dependencies
https://github.com/Dao-AILab/flash-attention

https://github.com/facebookresearch/xformers

https://github.com/openai/CLIP

https://github.com/THUDM/ImageReward

# Wheels

https://huggingface.co/datasets/siraxe/PrecompiledWheels_Torch-2.8-cu128-cp312

https://huggingface.co/lldacing/flash-attention-windows-wheel