license: apache-2.0
Repos
https://github.com/mit-han-lab/deepcompressor
Installation
https://github.com/mit-han-lab/deepcompressor/issues/56
https://github.com/nunchaku-tech/deepcompressor/issues/80
Windows
https://learn.microsoft.com/en-us/windows/wsl/install
https://www.anaconda.com/docs/getting-started/miniconda/install
Environment
python 3.12
cuda 12.8
torch 2.7
diffusers https://github.com/huggingface/diffusers
transformers 4.51
Calibration
Quantization
Model Path: https://github.com/nunchaku-tech/deepcompressor/issues/70#issuecomment-2788155233
Save model: --save-model true
or --save-model /PATH/TO/CHECKPOINT/DIR
Example: python -m deepcompressor.app.diffusion.ptq examples/diffusion/configs/model/flux.1-dev.yaml examples/diffusion/configs/svdquant/nvfp4.yaml
Folder Structure
Blockers
- NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.
Potential fix: app.diffusion.pipeline.config.py
@staticmethod
def _default_build(
name: str, path: str, dtype: str | torch.dtype, device: str | torch.device, shift_activations: bool
) -> DiffusionPipeline:
if not path:
if name == "sdxl":
path = "stabilityai/stable-diffusion-xl-base-1.0"
elif name == "sdxl-turbo":
path = "stabilityai/sdxl-turbo"
elif name == "pixart-sigma":
path = "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS"
elif name == "flux.1-kontext-dev":
path = "black-forest-labs/FLUX.1-Kontext-dev"
elif name == "flux.1-dev":
path = "black-forest-labs/FLUX.1-dev"
elif name == "flux.1-canny-dev":
path = "black-forest-labs/FLUX.1-Canny-dev"
elif name == "flux.1-depth-dev":
path = "black-forest-labs/FLUX.1-Depth-dev"
elif name == "flux.1-fill-dev":
path = "black-forest-labs/FLUX.1-Fill-dev"
elif name == "flux.1-schnell":
path = "black-forest-labs/FLUX.1-schnell"
else:
raise ValueError(f"Path for {name} is not specified.")
if name in ["flux.1-kontext-dev"]:
pipeline = FluxKontextPipeline.from_pretrained(path, torch_dtype=dtype)
elif name in ["flux.1-canny-dev", "flux.1-depth-dev"]:
pipeline = FluxControlPipeline.from_pretrained(path, torch_dtype=dtype)
elif name == "flux.1-fill-dev":
pipeline = FluxFillPipeline.from_pretrained(path, torch_dtype=dtype)
elif name.startswith("sana-"):
if dtype == torch.bfloat16:
pipeline = SanaPipeline.from_pretrained(path, variant="bf16", torch_dtype=dtype, use_safetensors=True)
pipeline.vae.to(dtype)
pipeline.text_encoder.to(dtype)
else:
pipeline = SanaPipeline.from_pretrained(path, torch_dtype=dtype)
else:
pipeline = AutoPipelineForText2Image.from_pretrained(path, torch_dtype=dtype)
# Debug output
print(">>> DEVICE:", device)
print(">>> PIPELINE TYPE:", type(pipeline))
# Try to move each component using .to_empty()
for name in ["unet", "transformer", "vae", "text_encoder"]:
module = getattr(pipeline, name, None)
if isinstance(module, torch.nn.Module):
try:
print(f">>> Moving {name} to {device} using to_empty()")
module.to_empty(device)
except Exception as e:
print(f">>> WARNING: {name}.to_empty({device}) failed: {e}")
try:
print(f">>> Falling back to {name}.to({device})")
module.to(device)
except Exception as ee:
print(f">>> ERROR: {name}.to({device}) also failed: {ee}")
# Identify main model (for patching)
model = getattr(pipeline, "unet", None) or getattr(pipeline, "transformer", None)
if model is not None:
replace_fused_linear_with_concat_linear(model)
replace_up_block_conv_with_concat_conv(model)
if shift_activations:
shift_input_activations(model)
else:
print(">>> WARNING: No model (unet/transformer) found for patching")
return pipeline
Debug Log
25-07-22 20:11:56 | I | === Start Evaluating ===
25-07-22 20:11:56 | I | * Building diffusion model pipeline
Loading pipeline components...: 0%| | 0/7 [00:00<?, ?it/s]
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 18.92it/s]
Loading pipeline components...: 100%|█████████████████████████████████████████████████████| 7/7 [00:00<00:00, 9.50it/s]
>>> DEVICE: cuda
>>> PIPELINE TYPE: <class 'diffusers.pipelines.flux.pipeline_flux_kontext.FluxKontextPipeline'>
>>> Moving transformer to cuda using to_empty()
>>> WARNING: transformer.to_empty(cuda) failed: Module.to_empty() takes 1 positional argument but 2 were given
>>> Falling back to transformer.to(cuda)
>>> ERROR: transformer.to(cuda) also failed: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.
>>> Moving vae to cuda using to_empty()
>>> WARNING: vae.to_empty(cuda) failed: Module.to_empty() takes 1 positional argument but 2 were given
>>> Falling back to vae.to(cuda)
>>> Moving text_encoder to cuda using to_empty()
>>> WARNING: text_encoder.to_empty(cuda) failed: Module.to_empty() takes 1 positional argument but 2 were given
>>> Falling back to text_encoder.to(cuda)
25-07-22 20:11:59 | I | Replacing fused Linear with ConcatLinear.
25-07-22 20:11:59 | I | + Replacing fused Linear in single_transformer_blocks.0 with ConcatLinear.
25-07-22 20:11:59 | I | - in_features = 3072/15360
25-07-22 20:11:59 | I | - out_features = 3072
25-07-22 20:11:59 | I | + Replacing fused Linear in single_transformer_blocks.1 with ConcatLinear.
25-07-22 20:11:59 | I | - in_features = 3072/15360
25-07-22 20:11:59 | I | - out_features = 3072
25-07-22 20:11:59 | I | + Replacing fused Linear in single_transformer_blocks.2 with ConcatLinear.
25-07-22 20:11:59 | I | - in_features = 3072/15360
25-07-22 20:11:59 | I | - out_features = 3072
- KeyError: <class 'diffusers.models.transformers.transformer_flux.FluxAttention'>
Potential fix: app.diffusion.nn.struct.py
@staticmethod
def _default_construct(
module: Attention,
/,
parent: tp.Optional["DiffusionTransformerBlockStruct"] = None,
fname: str = "",
rname: str = "",
rkey: str = "",
idx: int = 0,
**kwargs,
) -> "DiffusionAttentionStruct":
if isinstance(module, FluxAttention):
# FluxAttention has different attribute names than standard attention
with_rope = True
num_query_heads = module.heads # FluxAttention uses 'heads', not 'num_heads'
num_key_value_heads = module.heads # FLUX typically uses same for q/k/v
# FluxAttention doesn't have 'to_out', but may have other output projections
# Check what output projection attributes actually exist
o_proj = None
o_proj_rname = ""
# Try to find the correct output projection
if hasattr(module, 'to_out') and module.to_out is not None:
o_proj = module.to_out[0] if isinstance(module.to_out, (list, tuple)) else module.to_out
o_proj_rname = "to_out.0" if isinstance(module.to_out, (list, tuple)) else "to_out"
elif hasattr(module, 'to_add_out'):
o_proj = module.to_add_out
o_proj_rname = "to_add_out"
q_proj, k_proj, v_proj = module.to_q, module.to_k, module.to_v
q_proj_rname, k_proj_rname, v_proj_rname = "to_q", "to_k", "to_v"
q, k, v = module.to_q, module.to_k, module.to_v
q_rname, k_rname, v_rname = "to_q", "to_k", "to_v"
# Handle the add_* projections that FluxAttention has
add_q_proj = getattr(module, "add_q_proj", None)
add_k_proj = getattr(module, "add_k_proj", None)
add_v_proj = getattr(module, "add_v_proj", None)
add_o_proj = getattr(module, "to_add_out", None)
add_q_proj_rname = "add_q_proj" if add_q_proj else ""
add_k_proj_rname = "add_k_proj" if add_k_proj else ""
add_v_proj_rname = "add_v_proj" if add_v_proj else ""
add_o_proj_rname = "to_add_out" if add_o_proj else ""
kwargs = (
"encoder_hidden_states",
"attention_mask",
"image_rotary_emb",
)
cross_attention = add_k_proj is not None
elif module.is_cross_attention:
q_proj, k_proj, v_proj = module.to_q, None, None
add_q_proj, add_k_proj, add_v_proj, add_o_proj = None, module.to_k, module.to_v, None
q_proj_rname, k_proj_rname, v_proj_rname = "to_q", "", ""
add_q_proj_rname, add_k_proj_rname, add_v_proj_rname, add_o_proj_rname = "", "to_k", "to_v", ""
else:
q_proj, k_proj, v_proj = module.to_q, module.to_k, module.to_v
add_q_proj = getattr(module, "add_q_proj", None)
add_k_proj = getattr(module, "add_k_proj", None)
add_v_proj = getattr(module, "add_v_proj", None)
add_o_proj = getattr(module, "to_add_out", None)
q_proj_rname, k_proj_rname, v_proj_rname = "to_q", "to_k", "to_v"
add_q_proj_rname, add_k_proj_rname, add_v_proj_rname = "add_q_proj", "add_k_proj", "add_v_proj"
add_o_proj_rname = "to_add_out"
if getattr(module, "to_out", None) is not None:
o_proj = module.to_out[0]
o_proj_rname = "to_out.0"
assert isinstance(o_proj, nn.Linear)
elif parent is not None:
assert isinstance(parent.module, FluxSingleTransformerBlock)
assert isinstance(parent.module.proj_out, ConcatLinear)
assert len(parent.module.proj_out.linears) == 2
o_proj = parent.module.proj_out.linears[0]
o_proj_rname = ".proj_out.linears.0"
else:
raise RuntimeError("Cannot find the output projection.")
if isinstance(module.processor, DiffusionAttentionProcessor):
with_rope = module.processor.rope is not None
elif module.processor.__class__.__name__.startswith("Flux"):
with_rope = True
else:
with_rope = False # TODO: fix for other processors
config = AttentionConfigStruct(
hidden_size=q_proj.weight.shape[1],
add_hidden_size=add_k_proj.weight.shape[1] if add_k_proj is not None else 0,
inner_size=q_proj.weight.shape[0],
num_query_heads=module.heads,
num_key_value_heads=module.to_k.weight.shape[0] // (module.to_q.weight.shape[0] // module.heads),
with_qk_norm=module.norm_q is not None,
with_rope=with_rope,
linear_attn=isinstance(module.processor, SanaLinearAttnProcessor2_0),
)
return DiffusionAttentionStruct(
module=module,
parent=parent,
fname=fname,
idx=idx,
rname=rname,
rkey=rkey,
config=config,
q_proj=q_proj,
k_proj=k_proj,
v_proj=v_proj,
o_proj=o_proj,
add_q_proj=add_q_proj,
add_k_proj=add_k_proj,
add_v_proj=add_v_proj,
add_o_proj=add_o_proj,
q=None, # TODO: add q, k, v
k=None,
v=None,
q_proj_rname=q_proj_rname,
k_proj_rname=k_proj_rname,
v_proj_rname=v_proj_rname,
o_proj_rname=o_proj_rname,
add_q_proj_rname=add_q_proj_rname,
add_k_proj_rname=add_k_proj_rname,
add_v_proj_rname=add_v_proj_rname,
add_o_proj_rname=add_o_proj_rname,
q_rname="",
k_rname="",
v_rname="",
)
- ValueError: Provide either
prompt
orprompt_embeds
. Cannot leave bothprompt
andprompt_embeds
undefined.
Potential Fix: app.diffusion.dataset.collect.calib.py
def collect(config: DiffusionPtqRunConfig, dataset: datasets.Dataset):
samples_dirpath = os.path.join(config.output.root, "samples")
caches_dirpath = os.path.join(config.output.root, "caches")
os.makedirs(samples_dirpath, exist_ok=True)
os.makedirs(caches_dirpath, exist_ok=True)
caches = []
pipeline = config.pipeline.build()
model = pipeline.unet if hasattr(pipeline, "unet") else pipeline.transformer
assert isinstance(model, nn.Module)
model.register_forward_hook(CollectHook(caches=caches), with_kwargs=True)
batch_size = config.eval.batch_size
print(f"In total {len(dataset)} samples")
print(f"Evaluating with batch size {batch_size}")
pipeline.set_progress_bar_config(desc="Sampling", leave=False, dynamic_ncols=True, position=1)
for batch in tqdm(
dataset.iter(batch_size=batch_size, drop_last_batch=False),
desc="Data",
leave=False,
dynamic_ncols=True,
total=(len(dataset) + batch_size - 1) // batch_size,
):
filenames = batch["filename"]
prompts = batch["prompt"]
seeds = [hash_str_to_int(name) for name in filenames]
generators = [torch.Generator(device=pipeline.device).manual_seed(seed) for seed in seeds]
pipeline_kwargs = config.eval.get_pipeline_kwargs()
task = config.pipeline.task
control_root = config.eval.control_root
if task in ["canny-to-image", "depth-to-image", "inpainting"]:
controls = get_control(
task,
batch["image"],
names=batch["filename"],
data_root=os.path.join(
control_root, collect_config.dataset_name, f"{dataset.config_name}-{config.eval.num_samples}"
),
)
if task == "inpainting":
pipeline_kwargs["image"] = controls[0]
pipeline_kwargs["mask_image"] = controls[1]
else:
pipeline_kwargs["control_image"] = controls
# Handle meta tensors by moving individual components
try:
pipeline = pipeline.to("cuda")
except NotImplementedError:
# Move individual pipeline components that have to_empty method
if hasattr(pipeline, 'transformer') and pipeline.transformer is not None:
try:
pipeline.transformer = pipeline.transformer.to("cuda")
except NotImplementedError:
pipeline.transformer = pipeline.transformer.to_empty(device="cuda")
if hasattr(pipeline, 'text_encoder') and pipeline.text_encoder is not None:
try:
pipeline.text_encoder = pipeline.text_encoder.to("cuda")
except NotImplementedError:
pipeline.text_encoder = pipeline.text_encoder.to_empty(device="cuda")
if hasattr(pipeline, 'text_encoder_2') and pipeline.text_encoder_2 is not None:
try:
pipeline.text_encoder_2 = pipeline.text_encoder_2.to("cuda")
except NotImplementedError:
pipeline.text_encoder_2 = pipeline.text_encoder_2.to_empty(device="cuda")
if hasattr(pipeline, 'vae') and pipeline.vae is not None:
try:
pipeline.vae = pipeline.vae.to("cuda")
except NotImplementedError:
pipeline.vae = pipeline.vae.to_empty(device="cuda")
result_images = pipeline(prompt=prompts, generator=generators, **pipeline_kwargs).images
num_guidances = (len(caches) // batch_size) // config.eval.num_steps
num_steps = len(caches) // (batch_size * num_guidances)
assert (
len(caches) == batch_size * num_steps * num_guidances
), f"Unexpected number of caches: {len(caches)} != {batch_size} * {config.eval.num_steps} * {num_guidances}"
for j, (filename, image) in enumerate(zip(filenames, result_images, strict=True)):
image.save(os.path.join(samples_dirpath, f"{filename}.png"))
for s in range(num_steps):
for g in range(num_guidances):
c = caches[s * batch_size * num_guidances + g * batch_size + j]
c["filename"] = filename
c["step"] = s
c["guidance"] = g
c = tree_map(lambda x: process(x), c)
torch.save(c, os.path.join(caches_dirpath, f"{filename}-{s:05d}-{g}.pt"))
caches.clear()
References
https://github.com/nunchaku-tech/deepcompressor/blob/main/deepcompressor/nn/struct/attn.py
https://github.com/nunchaku-tech/nunchaku/blob/main/examples/flux.1-kontext-dev.py
https://github.com/nunchaku-tech/nunchaku/commit/b99fb8be615bc98c6915bbe06a1e0092cbc074a5
https://github.com/nunchaku-tech/deepcompressor/issues/91
Dependencies
https://github.com/Dao-AILab/flash-attention
https://github.com/facebookresearch/xformers
https://github.com/openai/CLIP
https://github.com/THUDM/ImageReward
Wheels
https://huggingface.co/datasets/siraxe/PrecompiledWheels_Torch-2.8-cu128-cp312
https://huggingface.co/lldacing/flash-attention-windows-wheel