SahilCarterr commited on
Commit
8c404d5
·
verified ·
1 Parent(s): be143fb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +145 -3
README.md CHANGED
@@ -1,3 +1,145 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ frameworks:
3
+ - Pytorch
4
+ tasks:
5
+ - text-to-image-synthesis
6
+
7
+ #model-type:
8
+ ##如 gpt、phi、llama、chatglm、baichuan 等
9
+ #- gpt
10
+
11
+ #domain:
12
+ ##如 nlp、cv、audio、multi-modal
13
+ #- nlp
14
+
15
+ #language:
16
+ ##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
17
+ #- cn
18
+
19
+ #metrics:
20
+ ##如 CIDEr、Blue、ROUGE 等
21
+ #- CIDEr
22
+
23
+ #tags:
24
+ ##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
25
+ #- pretrained
26
+
27
+ #tools:
28
+ ##如 vllm、fastchat、llamacpp、AdaSeq 等
29
+ #- vllm
30
+ base_model:
31
+ - Qwen/Qwen-Image
32
+ base_model_relation: adapter
33
+ ---
34
+ # Qwen-Image Image Structure Control Model
35
+
36
+ ![](./assets/cover.png)
37
+
38
+ ## Model Introduction
39
+
40
+ This model is a local image redraw model trained based on [Qwen-Image](https://www.modelscope.cn/models/Qwen/Qwen-Image) , with a model structure of ControlNet, capable of redrawing local areas of an image. The training framework is built on [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) , and the dataset used is [Qwen-Image-Self-Generated-Dataset](https://www.modelscope.cn/datasets/DiffSynth-Studio/Qwen-Image-Self-Generated-Dataset)。
41
+
42
+ This model is compatible with both [Qwen-Image](https://www.modelscope.cn/models/Qwen/Qwen-Image) and [Qwen-Image-Edit](https://www.modelscope.cn/models/Qwen/Qwen-Image-Edit),It can perform local redrawing on Qwen-Image and edit specified areas on Qwen-Image-Edit.
43
+
44
+ ## Effect Demonstration
45
+
46
+ |Input Prompt|Input Image|Redrawn Image|
47
+ |-|-|-|
48
+ |A robot with wings and a hat standing in a colorful garden with flowers and butterflies.|![](./assets/image_1_1.jpg)|![](./assets/image_1_2.jpg)|
49
+ |A girl in a school uniform stands gracefully in front of a vibrant stained glass window with colorful geometric patterns.|![](./assets/image_2_1.jpg)|![](./assets/image_2_2.jpg)|
50
+ |A small wooden boat battles against towering, crashing waves in a stormy sea.|![](./assets/image_3_1.png)|![](./assets/image_3_2.png)|
51
+
52
+ ## Limitations
53
+ - Inpaint models based on the ControlNet structure may result in disharmonious boundaries between the redrawn and non-redrawn areas.
54
+
55
+ - The model is trained on rectangular area redraw data, so its generalization to non-rectangular areas might not be optimal.
56
+
57
+
58
+ ## Inference Code
59
+ ```
60
+ git clone https://github.com/modelscope/DiffSynth-Studio.git
61
+ cd DiffSynth-Studio
62
+ pip install -e .
63
+ ```
64
+
65
+ Qwen-Image:
66
+
67
+ ```python
68
+ import torch
69
+ from PIL import Image
70
+ from modelscope import dataset_snapshot_download
71
+ from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig, ControlNetInput
72
+
73
+
74
+ pipe = QwenImagePipeline.from_pretrained(
75
+ torch_dtype=torch.bfloat16,
76
+ device="cuda",
77
+ model_configs=[
78
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
79
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
80
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
81
+ ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Inpaint", origin_file_pattern="model.safetensors"),
82
+ ],
83
+ tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
84
+ )
85
+
86
+ dataset_snapshot_download(
87
+ dataset_id="DiffSynth-Studio/example_image_dataset",
88
+ local_dir="./data/example_image_dataset",
89
+ allow_file_pattern="inpaint/*.jpg"
90
+ )
91
+ prompt = "a cat with sunglasses"
92
+ controlnet_image = Image.open("./data/example_image_dataset/inpaint/image_1.jpg").convert("RGB").resize((1328, 1328))
93
+ inpaint_mask = Image.open("./data/example_image_dataset/inpaint/mask.jpg").convert("RGB").resize((1328, 1328))
94
+ image = pipe(
95
+ prompt, seed=0,
96
+ input_image=controlnet_image, inpaint_mask=inpaint_mask,
97
+ blockwise_controlnet_inputs=[ControlNetInput(image=controlnet_image, inpaint_mask=inpaint_mask)],
98
+ num_inference_steps=40,
99
+ )
100
+ image.save("image.jpg")
101
+ ```
102
+
103
+ Qwen-Image-Edit:
104
+
105
+ ```python
106
+ import torch
107
+ from PIL import Image
108
+ from modelscope import dataset_snapshot_download
109
+ from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig, ControlNetInput
110
+
111
+
112
+ pipe = QwenImagePipeline.from_pretrained(
113
+ torch_dtype=torch.bfloat16,
114
+ device="cuda",
115
+ model_configs=[
116
+ ModelConfig(model_id="Qwen/Qwen-Image-Edit", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
117
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
118
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
119
+ ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Inpaint", origin_file_pattern="model.safetensors"),
120
+ ],
121
+ tokenizer_config=None,
122
+ processor_config=ModelConfig(model_id="Qwen/Qwen-Image-Edit", origin_file_pattern="processor/"),
123
+ )
124
+
125
+ dataset_snapshot_download(
126
+ dataset_id="DiffSynth-Studio/example_image_dataset",
127
+ local_dir="./data/example_image_dataset",
128
+ allow_file_pattern="inpaint/*.jpg"
129
+ )
130
+ prompt = "Put sunglasses on this cat"
131
+ controlnet_image = Image.open("./data/example_image_dataset/inpaint/image_1.jpg").convert("RGB").resize((1328, 1328))
132
+ inpaint_mask = Image.open("./data/example_image_dataset/inpaint/mask.jpg").convert("RGB").resize((1328, 1328))
133
+ image = pipe(
134
+ prompt, seed=0,
135
+ input_image=controlnet_image, inpaint_mask=inpaint_mask,
136
+ blockwise_controlnet_inputs=[ControlNetInput(image=controlnet_image, inpaint_mask=inpaint_mask)],
137
+ num_inference_steps=40,
138
+ edit_image=controlnet_image, # add edit_image here.
139
+ )
140
+ image.save("image.jpg")
141
+ ```
142
+
143
+ ---
144
+ license: apache-2.0
145
+ ---