QuantStack/FLUX.1-Kontext-dev-GGUF · RuntimeError: Latent Reshape Dimension Mismatch with GGUF‑Quantized FluxTransformer2DModel in Diffusers

I’m trying to load the GGUF‑quantized model with 🤗 diffusers and then plug it into FluxKontextPipeline. I followed the official doc https://huggingface.co/docs/diffusers/main/en/quantization/gguf, and my code looks like this:

            torch_dtype = torch.float16 if fp16 else torch.float32
            transformer = FluxTransformer2DModel.from_single_file(
            gguf_file,
            quantization_config=GGUFQuantizationConfig(compute_dtype=torch_dtype),
            torch_dtype=torch_dtype,
            )
            pipeline = FluxKontextPipeline.from_pretrained('my_local_path/black-forest-labs/FLUX.1-Kontext-dev', local_files_only=True, torch_dtype=torch_dtype, transformer=transformer)

However, it does not work because of the shape error as follows:

latents = latents.view(batch_size, num_channels_latents, height // 2, 2, width // 2, 2)
RuntimeError: shape '[2, 32, 64, 2, 64, 2]' is invalid for input of size 524288

Any suggestions would be greatly appreciated. Thanks!