SahilCarterr commited on
Commit
23a10af
·
verified ·
1 Parent(s): 938df2a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +99 -3
README.md CHANGED
@@ -1,3 +1,99 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ frameworks:
3
+ - Pytorch
4
+
5
+ tasks:
6
+ - image-to-image-synthesis
7
+
8
+ #model-type:
9
+ ##如 gpt、phi、llama、chatglm、baichuan 等
10
+ #- gpt
11
+
12
+ #domain:
13
+ ##如 nlp、cv、audio、multi-modal
14
+ #- nlp
15
+
16
+ #language:
17
+ ##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
18
+ #- cn
19
+
20
+ #metrics:
21
+ ##如 CIDEr、Blue、ROUGE 等
22
+ #- CIDEr
23
+
24
+ #tags:
25
+ ##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
26
+ #- pretrained
27
+
28
+ #tools:
29
+ ##如 vllm、fastchat、llamacpp、AdaSeq 等
30
+ #- vllm
31
+ base_model:
32
+ - Qwen/Qwen-Image-Edit
33
+ base_model_relation: adapter
34
+ ---
35
+ # Qwen-Image-Edit Low-Resolution Input Repair LoRA
36
+
37
+ ## Model Introduction
38
+
39
+ [Qwen-Image-Edit](https://www.modelscope.cn/models/Qwen/Qwen-Image-Edit) is a powerful open-source image editing model. However, when the input resolution of the model is lower than the target resolution for image generation, the model's ability to maintain image details is poor. To address this, we made the following two modifications:
40
+
41
+ 1. Rope Interpolation: The position encoding of the input image in Qwen-Image DiT is changed to an interpolated sampling of the position encoding at the target resolution. This modification can take effect independently of modification 2.
42
+ 2. LoRA Fine-tuning: Quickly train a LoRA model to enhance the generalization of this interpolated encoding by DiT.
43
+
44
+ With these two modifications, the model can produce consistent edited images even when given low-resolution input. Additionally, compared to high-resolution input, the inference time of the model is significantly reduced.
45
+
46
+ ## Effect Demonstration
47
+
48
+ Image Editing Instruction: Change the skirt to pink.
49
+ |Input Resolution|A100 Inference Time|Input Image|Original Model|Rope Interpolation|Rope Interpolation + LoRA Fine-tuning|
50
+ |-|-|-|-|-|-|
51
+ |256x256| 39 s |![](./assets/image1.jpg)|![](./assets/origin_256.jpg)|![](./assets/rope_256.jpg)|![](./assets/lora_256.jpg)|
52
+ |512x512| 50 s |![](./assets/image1.jpg)|![](./assets/origin_512.jpg)|![](./assets/rope_512.jpg)|![](./assets/lora_512.jpg)|
53
+ |768x768| 67 s |![](./assets/image1.jpg)|![](./assets/origin_768.jpg)|![](./assets/rope_768.jpg)|![](./assets/lora_768.jpg)|
54
+ |1024x1024| 98 s|![](./assets/image1.jpg)|![](./assets/origin_1024.jpg)|![](./assets/origin_1024.jpg)|![](./assets/lora_1024.jpg)|
55
+
56
+ ## Limitations
57
+
58
+ 1. Using low-resolution input and generating high-resolution output will greatly reduce the inference time, but it may degrade the model's editing performance.
59
+ 2. The above analysis is only focused on the model's ability to maintain image detail.
60
+
61
+ ## Inference Code
62
+ ```
63
+ git clone https://github.com/modelscope/DiffSynth-Studio.git
64
+ cd DiffSynth-Studio
65
+ pip install -e .
66
+ ```
67
+
68
+ ```python
69
+ from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
70
+ import torch
71
+ from modelscope import snapshot_download
72
+
73
+ pipe = QwenImagePipeline.from_pretrained(
74
+ torch_dtype=torch.bfloat16,
75
+ device="cuda",
76
+ model_configs=[
77
+ ModelConfig(model_id="Qwen/Qwen-Image-Edit", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
78
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
79
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
80
+ ],
81
+ tokenizer_config=None,
82
+ processor_config=ModelConfig(model_id="Qwen/Qwen-Image-Edit", origin_file_pattern="processor/"),
83
+ )
84
+ snapshot_download("DiffSynth-Studio/Qwen-Image-Edit-Lowres-Fix", local_dir="models/DiffSynth-Studio/Qwen-Image-Edit-Lowres-Fix", allow_file_pattern="model.safetensors")
85
+ pipe.load_lora(pipe.dit, "models/DiffSynth-Studio/Qwen-Image-Edit-Lowres-Fix/model.safetensors")
86
+
87
+ prompt = "Exquisite portrait of an underwater girl with flowing blue dress and fluttering hair. Transparent light and shadow, surrounded by bubbles. Her face is serene, with exquisite details and dreamy beauty."
88
+ image = pipe(prompt=prompt, seed=0, num_inference_steps=40, height=1024, width=768)
89
+ image.save("image.jpg")
90
+
91
+ prompt = "turn skirt pink"
92
+ image = image.resize((512, 384))
93
+ image = pipe(prompt, edit_image=image, seed=1, num_inference_steps=40, height=1024, width=768, edit_rope_interpolation=True, edit_image_auto_resize=False)
94
+ image.save(f"image2.jpg")
95
+ ```
96
+
97
+ ---
98
+ license: apache-2.0
99
+ ---