lym00
/

nunchaku_svdquant_deepcompressor_0.1.0_quantization_flux.1_kontext_dev_test

Model card Files Files and versions

xet

Community

lym00 commited on Jul 23

Commit

77f0198

verified ·

1 Parent(s): b1d4119

Update README.md

Browse files

Files changed (1) hide show

README.md +284 -34

README.md CHANGED Viewed

@@ -128,7 +128,7 @@ Potential fix: app.diffusion.pipeline.config.py
             if isinstance(module, torch.nn.Module):
                 try:
                     print(f">>> Moving {name} to {device} using to_empty()")
-                    module.to_empty(device)
                 except Exception as e:
                     print(f">>> WARNING: {name}.to_empty({device}) failed: {e}")
                     try:
@@ -150,38 +150,6 @@ Potential fix: app.diffusion.pipeline.config.py
         return pipeline
 ```
-Debug Log
-```
-25-07-22 20:11:56 | I | === Start Evaluating ===
-25-07-22 20:11:56 | I | * Building diffusion model pipeline
-Loading pipeline components...:   0%|                                                             | 0/7 [00:00<?, ?it/s]
-You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
-Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 18.92it/s]
-Loading pipeline components...: 100%|█████████████████████████████████████████████████████| 7/7 [00:00<00:00,  9.50it/s]
->>> DEVICE: cuda
->>> PIPELINE TYPE: <class 'diffusers.pipelines.flux.pipeline_flux_kontext.FluxKontextPipeline'>
->>> Moving transformer to cuda using to_empty()
->>> WARNING: transformer.to_empty(cuda) failed: Module.to_empty() takes 1 positional argument but 2 were given
->>> Falling back to transformer.to(cuda)
->>> ERROR: transformer.to(cuda) also failed: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.
->>> Moving vae to cuda using to_empty()
->>> WARNING: vae.to_empty(cuda) failed: Module.to_empty() takes 1 positional argument but 2 were given
->>> Falling back to vae.to(cuda)
->>> Moving text_encoder to cuda using to_empty()
->>> WARNING: text_encoder.to_empty(cuda) failed: Module.to_empty() takes 1 positional argument but 2 were given
->>> Falling back to text_encoder.to(cuda)
-25-07-22 20:11:59 | I |   Replacing fused Linear with ConcatLinear.
-25-07-22 20:11:59 | I |     + Replacing fused Linear in single_transformer_blocks.0 with ConcatLinear.
-25-07-22 20:11:59 | I |       - in_features = 3072/15360
-25-07-22 20:11:59 | I |       - out_features = 3072
-25-07-22 20:11:59 | I |     + Replacing fused Linear in single_transformer_blocks.1 with ConcatLinear.
-25-07-22 20:11:59 | I |       - in_features = 3072/15360
-25-07-22 20:11:59 | I |       - out_features = 3072
-25-07-22 20:11:59 | I |     + Replacing fused Linear in single_transformer_blocks.2 with ConcatLinear.
-25-07-22 20:11:59 | I |       - in_features = 3072/15360
-25-07-22 20:11:59 | I |       - out_features = 3072
-```
 2) KeyError: <class 'diffusers.models.transformers.transformer_flux.FluxAttention'>
 Potential fix: app.diffusion.nn.struct.py
@@ -413,7 +381,7 @@ def collect(config: DiffusionPtqRunConfig, dataset: datasets.Dataset):
 4) RuntimeError: Tensor.item() cannot be called on meta tensors
-Potential Fix: deepcompressor.quantizer.impl.scale.py
 ```python
 def quantize_scale(
@@ -591,6 +559,288 @@ def quantize_scale(
         return s, z
 ```
 References
 https://github.com/nunchaku-tech/nunchaku/commit/b99fb8be615bc98c6915bbe06a1e0092cbc074a5

             if isinstance(module, torch.nn.Module):
                 try:
                     print(f">>> Moving {name} to {device} using to_empty()")
+                    module.to_empty(device=device)
                 except Exception as e:
                     print(f">>> WARNING: {name}.to_empty({device}) failed: {e}")
                     try:
         return pipeline
 ```
 2) KeyError: <class 'diffusers.models.transformers.transformer_flux.FluxAttention'>
 Potential fix: app.diffusion.nn.struct.py
 4) RuntimeError: Tensor.item() cannot be called on meta tensors
+Potential Fix: quantizer.impl.scale.py
 ```python
 def quantize_scale(
         return s, z
 ```
+Potential Fix: app.diffusion.ptq.py
+```python
+def ptq(  # noqa: C901
+    model: DiffusionModelStruct,
+    config: DiffusionQuantConfig,
+    cache: DiffusionPtqCacheConfig | None = None,
+    load_dirpath: str = "",
+    save_dirpath: str = "",
+    copy_on_save: bool = False,
+    save_model: bool = False,
+) -> DiffusionModelStruct:
+    """Post-training quantization of a diffusion model.
+    Args:
+        model (`DiffusionModelStruct`):
+            The diffusion model.
+        config (`DiffusionQuantConfig`):
+            The diffusion model post-training quantization configuration.
+        cache (`DiffusionPtqCacheConfig`, *optional*, defaults to `None`):
+            The diffusion model quantization cache path configuration.
+        load_dirpath (`str`, *optional*, defaults to `""`):
+            The directory path to load the quantization checkpoint.
+        save_dirpath (`str`, *optional*, defaults to `""`):
+            The directory path to save the quantization checkpoint.
+        copy_on_save (`bool`, *optional*, defaults to `False`):
+            Whether to copy the cache to the save directory.
+        save_model (`bool`, *optional*, defaults to `False`):
+            Whether to save the quantized model checkpoint.
+    Returns:
+        `DiffusionModelStruct`:
+            The quantized diffusion model.
+    """
+    logger = tools.logging.getLogger(__name__)
+    if not isinstance(model, DiffusionModelStruct):
+        model = DiffusionModelStruct.construct(model)
+    assert isinstance(model, DiffusionModelStruct)
+    quant_wgts = config.enabled_wgts
+    quant_ipts = config.enabled_ipts
+    quant_opts = config.enabled_opts
+    quant_acts = quant_ipts or quant_opts
+    quant = quant_wgts or quant_acts
+    load_model_path, load_path, save_path = "", None, None
+    if load_dirpath:
+        load_path = DiffusionQuantCacheConfig(
+            smooth=os.path.join(load_dirpath, "smooth.pt"),
+            branch=os.path.join(load_dirpath, "branch.pt"),
+            wgts=os.path.join(load_dirpath, "wgts.pt"),
+            acts=os.path.join(load_dirpath, "acts.pt"),
+        )
+        load_model_path = os.path.join(load_dirpath, "model.pt")
+        if os.path.exists(load_model_path):
+            if config.enabled_wgts and config.wgts.enabled_low_rank:
+                if os.path.exists(load_path.branch):
+                    load_model = True
+                else:
+                    logger.warning(f"Model low-rank branch checkpoint {load_path.branch} does not exist")
+                    load_model = False
+            else:
+                load_model = True
+            if load_model:
+                logger.info(f"* Loading model from {load_model_path}")
+                save_dirpath = ""  # do not save the model if loading
+        else:
+            logger.warning(f"Model checkpoint {load_model_path} does not exist")
+            load_model = False
+    else:
+        load_model = False
+    if save_dirpath:
+        os.makedirs(save_dirpath, exist_ok=True)
+        save_path = DiffusionQuantCacheConfig(
+            smooth=os.path.join(save_dirpath, "smooth.pt"),
+            branch=os.path.join(save_dirpath, "branch.pt"),
+            wgts=os.path.join(save_dirpath, "wgts.pt"),
+            acts=os.path.join(save_dirpath, "acts.pt"),
+        )
+    else:
+        save_model = False
+    if quant and config.enabled_rotation:
+        logger.info("* Rotating model for quantization")
+        tools.logging.Formatter.indent_inc()
+        rotate_diffusion(model, config=config)
+        tools.logging.Formatter.indent_dec()
+        gc.collect()
+        torch.cuda.empty_cache()
+    # region smooth quantization
+    if quant and config.enabled_smooth:
+        logger.info("* Smoothing model for quantization")
+        tools.logging.Formatter.indent_inc()
+        load_from = ""
+        if load_path and os.path.exists(load_path.smooth):
+            load_from = load_path.smooth
+        elif cache and cache.path.smooth and os.path.exists(cache.path.smooth):
+            load_from = cache.path.smooth
+        if load_from:
+            logger.info(f"- Loading smooth scales from {load_from}")
+            smooth_cache = torch.load(load_from)
+            smooth_diffusion(model, config, smooth_cache=smooth_cache)
+        else:
+            logger.info("- Generating smooth scales")
+            smooth_cache = smooth_diffusion(model, config)
+            if cache and cache.path.smooth:
+                logger.info(f"- Saving smooth scales to {cache.path.smooth}")
+                os.makedirs(cache.dirpath.smooth, exist_ok=True)
+                torch.save(smooth_cache, cache.path.smooth)
+                load_from = cache.path.smooth
+        if save_path:
+            if not copy_on_save and load_from:
+                logger.info(f"- Linking smooth scales to {save_path.smooth}")
+                os.symlink(os.path.relpath(load_from, save_dirpath), save_path.smooth)
+            else:
+                logger.info(f"- Saving smooth scales to {save_path.smooth}")
+                torch.save(smooth_cache, save_path.smooth)
+        del smooth_cache
+        tools.logging.Formatter.indent_dec()
+        gc.collect()
+        torch.cuda.empty_cache()
+    # endregion
+    # region collect original state dict
+    if config.needs_acts_quantizer_cache:
+        if load_path and os.path.exists(load_path.acts):
+            orig_state_dict = None
+        elif cache and cache.path.acts and os.path.exists(cache.path.acts):
+            orig_state_dict = None
+        else:
+            orig_state_dict: dict[str, torch.Tensor] = {
+                name: param.detach().clone() for name, param in model.module.named_parameters() if param.ndim > 1
+            }
+    else:
+        orig_state_dict = None
+    # endregion
+    if load_model:
+        logger.info(f"* Loading model checkpoint from {load_model_path}")
+        load_diffusion_weights_state_dict(
+            model,
+            config,
+            state_dict=torch.load(load_model_path),
+            branch_state_dict=torch.load(load_path.branch) if os.path.exists(load_path.branch) else None,
+        )
+        gc.collect()
+        torch.cuda.empty_cache()
+    elif quant_wgts:
+        logger.info("* Ensuring model is on actual device before quantization")
+        # Check if model has meta tensors
+        has_meta_tensors = any(param.is_meta for param in model.module.parameters())
+        if has_meta_tensors:
+            logger.info("* Model contains meta tensors, materializing to actual device")
+            # Option 1: Use to_empty() and reload weights (recommended)
+            device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+            # Store original state dict if available
+            try:
+                original_state_dict = model.module.state_dict()
+                model.module = model.module.to_empty(device=device)
+                model.module.load_state_dict(original_state_dict)
+                logger.info("* Successfully materialized model with original weights")
+            except Exception as e:
+                logger.warning(f"* Failed to preserve weights during materialization: {e}")
+                # Fallback: just move to empty device (weights will be zero)
+                model.module = model.module.to_empty(device=device)
+                logger.warning("* Model moved to device but weights may be uninitialized")
+        else:
+            # Model already has real tensors, just ensure it's on the right device
+            device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+            model.module = model.module.to(device)
+        # Verify no meta tensors remain
+        remaining_meta = [name for name, param in model.module.named_parameters() if param.is_meta]
+        if remaining_meta:
+            raise RuntimeError(f"Parameters still on meta device: {remaining_meta}")
+        logger.info("* Model successfully prepared for quantization")
+        logger.info("* Quantizing weights")
+        tools.logging.Formatter.indent_inc()
+        quantizer_state_dict, quantizer_load_from = None, ""
+        if load_path and os.path.exists(load_path.wgts):
+            quantizer_load_from = load_path.wgts
+        elif cache and cache.path.wgts and os.path.exists(cache.path.wgts):
+            quantizer_load_from = cache.path.wgts
+        if quantizer_load_from:
+            logger.info(f"- Loading weight settings from {quantizer_load_from}")
+            quantizer_state_dict = torch.load(quantizer_load_from)
+        branch_state_dict, branch_load_from = None, ""
+        if load_path and os.path.exists(load_path.branch):
+            branch_load_from = load_path.branch
+        elif cache and cache.path.branch and os.path.exists(cache.path.branch):
+            branch_load_from = cache.path.branch
+        if branch_load_from:
+            logger.info(f"- Loading branch settings from {branch_load_from}")
+            branch_state_dict = torch.load(branch_load_from)
+        if not quantizer_load_from:
+            logger.info("- Generating weight settings")
+        if not branch_load_from:
+            logger.info("- Generating branch settings")
+        quantizer_state_dict, branch_state_dict, scale_state_dict = quantize_diffusion_weights(
+            model,
+            config,
+            quantizer_state_dict=quantizer_state_dict,
+            branch_state_dict=branch_state_dict,
+            return_with_scale_state_dict=bool(save_dirpath),
+        )
+        if not quantizer_load_from and cache and cache.dirpath.wgts:
+            logger.info(f"- Saving weight settings to {cache.path.wgts}")
+            os.makedirs(cache.dirpath.wgts, exist_ok=True)
+            torch.save(quantizer_state_dict, cache.path.wgts)
+            quantizer_load_from = cache.path.wgts
+        if not branch_load_from and cache and cache.dirpath.branch:
+            logger.info(f"- Saving branch settings to {cache.path.branch}")
+            os.makedirs(cache.dirpath.branch, exist_ok=True)
+            torch.save(branch_state_dict, cache.path.branch)
+            branch_load_from = cache.path.branch
+        if save_path:
+            if not copy_on_save and quantizer_load_from:
+                logger.info(f"- Linking weight settings to {save_path.wgts}")
+                os.symlink(os.path.relpath(quantizer_load_from, save_dirpath), save_path.wgts)
+            else:
+                logger.info(f"- Saving weight settings to {save_path.wgts}")
+                torch.save(quantizer_state_dict, save_path.wgts)
+            if not copy_on_save and branch_load_from:
+                logger.info(f"- Linking branch settings to {save_path.branch}")
+                os.symlink(os.path.relpath(branch_load_from, save_dirpath), save_path.branch)
+            else:
+                logger.info(f"- Saving branch settings to {save_path.branch}")
+                torch.save(branch_state_dict, save_path.branch)
+        if save_model:
+            logger.info(f"- Saving model to {save_dirpath}")
+            torch.save(scale_state_dict, os.path.join(save_dirpath, "scale.pt"))
+            torch.save(model.module.state_dict(), os.path.join(save_dirpath, "model.pt"))
+        del quantizer_state_dict, branch_state_dict, scale_state_dict
+        tools.logging.Formatter.indent_dec()
+        gc.collect()
+        torch.cuda.empty_cache()
+    if quant_acts:
+        logger.info("  * Quantizing activations")
+        tools.logging.Formatter.indent_inc()
+        if config.needs_acts_quantizer_cache:
+            load_from = ""
+            if load_path and os.path.exists(load_path.acts):
+                load_from = load_path.acts
+            elif cache and cache.path.acts and os.path.exists(cache.path.acts):
+                load_from = cache.path.acts
+            if load_from:
+                logger.info(f"- Loading activation settings from {load_from}")
+                quantizer_state_dict = torch.load(load_from)
+                quantize_diffusion_activations(
+                    model, config, quantizer_state_dict=quantizer_state_dict, orig_state_dict=orig_state_dict
+                )
+            else:
+                logger.info("- Generating activation settings")
+                quantizer_state_dict = quantize_diffusion_activations(model, config, orig_state_dict=orig_state_dict)
+                if cache and cache.dirpath.acts and quantizer_state_dict is not None:
+                    logger.info(f"- Saving activation settings to {cache.path.acts}")
+                    os.makedirs(cache.dirpath.acts, exist_ok=True)
+                    torch.save(quantizer_state_dict, cache.path.acts)
+                load_from = cache.path.acts
+            if save_dirpath:
+                if not copy_on_save and load_from:
+                    logger.info(f"- Linking activation quantizer settings to {save_path.acts}")
+                    os.symlink(os.path.relpath(load_from, save_dirpath), save_path.acts)
+                else:
+                    logger.info(f"- Saving activation quantizer settings to {save_path.acts}")
+                    torch.save(quantizer_state_dict, save_path.acts)
+            del quantizer_state_dict
+        else:
+            logger.info("- No need to generate/load activation quantizer settings")
+            quantize_diffusion_activations(model, config, orig_state_dict=orig_state_dict)
+        tools.logging.Formatter.indent_dec()
+        del orig_state_dict
+        gc.collect()
+        torch.cuda.empty_cache()
+    return model
+```
 References
 https://github.com/nunchaku-tech/nunchaku/commit/b99fb8be615bc98c6915bbe06a1e0092cbc074a5