Qwen-Image-Edit Low-Resolution Input Repair LoRA

Model Introduction

Qwen-Image-Edit is a powerful open-source image editing model. However, when the input resolution of the model is lower than the target resolution for image generation, the model's ability to maintain image details is poor. To address this, we made the following two modifications:

Rope Interpolation: The position encoding of the input image in Qwen-Image DiT is changed to an interpolated sampling of the position encoding at the target resolution. This modification can take effect independently of modification 2.
LoRA Fine-tuning: Quickly train a LoRA model to enhance the generalization of this interpolated encoding by DiT.

With these two modifications, the model can produce consistent edited images even when given low-resolution input. Additionally, compared to high-resolution input, the inference time of the model is significantly reduced.

Effect Demonstration

Image Editing Instruction: Change the skirt to pink.

Input Resolution	A100 Inference Time	Input Image	Original Model	Rope Interpolation	Rope Interpolation + LoRA Fine-tuning
256x256	39 s
512x512	50 s
768x768	67 s
1024x1024	98 s

Limitations

Using low-resolution input and generating high-resolution output will greatly reduce the inference time, but it may degrade the model's editing performance.
The above analysis is only focused on the model's ability to maintain image detail.

Inference Code

git clone https://github.com/modelscope/DiffSynth-Studio.git  
cd DiffSynth-Studio
pip install -e .

from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
import torch
from modelscope import snapshot_download

pipe = QwenImagePipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
    device="cuda",
    model_configs=[
        ModelConfig(model_id="Qwen/Qwen-Image-Edit", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
        ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
    ],
    tokenizer_config=None,
    processor_config=ModelConfig(model_id="Qwen/Qwen-Image-Edit", origin_file_pattern="processor/"),
)
snapshot_download("DiffSynth-Studio/Qwen-Image-Edit-Lowres-Fix", local_dir="models/DiffSynth-Studio/Qwen-Image-Edit-Lowres-Fix", allow_file_pattern="model.safetensors")
pipe.load_lora(pipe.dit, "models/DiffSynth-Studio/Qwen-Image-Edit-Lowres-Fix/model.safetensors")

prompt = "Exquisite portrait of an underwater girl with flowing blue dress and fluttering hair. Transparent light and shadow, surrounded by bubbles. Her face is serene, with exquisite details and dreamy beauty."
image = pipe(prompt=prompt, seed=0, num_inference_steps=40, height=1024, width=768)
image.save("image.jpg")

prompt = "turn skirt pink"
image = image.resize((512, 384))
image = pipe(prompt, edit_image=image, seed=1, num_inference_steps=40, height=1024, width=768, edit_rope_interpolation=True, edit_image_auto_resize=False)
image.save(f"image2.jpg")

SahilCarterr
/

Qwen-Image-Edit-Lowres-Fix

Qwen-Image-Edit Low-Resolution Input Repair LoRA

Model Introduction

Effect Demonstration

Limitations

Inference Code

license: apache-2.0

Model tree for SahilCarterr/Qwen-Image-Edit-Lowres-Fix