SahilCarterr commited on
Commit
1cbe253
·
verified ·
1 Parent(s): b5718f0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +104 -3
README.md CHANGED
@@ -1,3 +1,104 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ frameworks:
3
+ - Pytorch
4
+
5
+ tasks:
6
+ - text-to-image-synthesis
7
+
8
+ #model-type:
9
+ ##如 gpt、phi、llama、chatglm、baichuan 等
10
+ #- gpt
11
+
12
+ #domain:
13
+ ##如 nlp、cv、audio、multi-modal
14
+ #- nlp
15
+
16
+ #language:
17
+ ##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
18
+ #- cn
19
+
20
+ #metrics:
21
+ ##如 CIDEr、Blue、ROUGE 等
22
+ #- CIDEr
23
+
24
+ #tags:
25
+ ##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
26
+ #- pretrained
27
+
28
+ #tools:
29
+ ##如 vllm、fastchat、llamacpp、AdaSeq 等
30
+ #- vllm
31
+ base_model:
32
+ - Qwen/Qwen-Image
33
+ base_model_relation: adapter
34
+ ---
35
+ # Qwen-Image Image Structure Control Model
36
+
37
+ ![](assets/title.png)
38
+
39
+ ## Model Introduction
40
+
41
+ This model is a LoRA for image structure control, trained based on [Qwen-Image](https://www.modelscope.cn/models/Qwen/Qwen-Image), adopting the In Context technical approach. It supports multiple conditions: canny, depth, lineart, softedge, normal, and openpose. The training framework is built upon [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) , and the dataset used is[Qwen-Image-Self-Generated-Dataset](https://www.modelscope.cn/datasets/DiffSynth-Studio/Qwen-Image-Self-Generated-Dataset) It is recommended to start the input Prompt with "Context_Control. ".
42
+
43
+ Please note that when using Openpose control, due to the particularity of this type of control, it cannot achieve a similar "point-to-point" control effect as other control types.
44
+
45
+ ## Effect Demonstration
46
+
47
+ |Control Condition|Control Image|Generated Image 1|Generated Image 2|
48
+ |-|-|-|-|
49
+ |canny|![](./assets/1_canny.png)|![](./assets/canny_image_seed_1_blue.png)|![](./assets/canny_image_seed_3_pink.png)|
50
+ |depth|![](./assets/1_depth.png)|![](./assets/depth_image_seed_2_blue.png)|![](./assets/depth_image_seed_2_pink_1.png)|
51
+ |lineart|![](./assets/1_lineart.png)|![](./assets/lineart_image_seed_1_blue.png)|![](./assets/lineart_image_seed_2_pink_1.png)|
52
+ |softedge|![](./assets/1_softedge.png)|![](./assets/softedge_image_seed_2_blue.png)|![](./assets/softedge_image_seed_2_pink_1.png)|
53
+ |normal|![](./assets/1_normal.png)|![](./assets/normal_image_seed_2_blue.png)|![](./assets/normal_image_seed_2_pink_1.png)|
54
+ |openpose|![](./assets/1_openpose.png)|![](./assets/openpose_image_seed_1_blue.png)|![](./assets/openpose_image_seed_4_pink.png)|
55
+
56
+
57
+ ## Inference Code
58
+ ```
59
+ git clone https://github.com/modelscope/DiffSynth-Studio.git
60
+ cd DiffSynth-Studio
61
+ pip install -e .
62
+ ```
63
+
64
+ ```python
65
+ from PIL import Image
66
+ import torch
67
+ from modelscope import dataset_snapshot_download, snapshot_download
68
+ from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
69
+ from diffsynth.controlnets.processors import Annotator
70
+
71
+ allow_file_pattern = ["sk_model.pth", "sk_model2.pth", "dpt_hybrid-midas-501f0c75.pt", "ControlNetHED.pth", "body_pose_model.pth", "hand_pose_model.pth", "facenet.pth", "scannet.pt"]
72
+ snapshot_download("lllyasviel/Annotators", local_dir="models/Annotators", allow_file_pattern=allow_file_pattern)
73
+
74
+ pipe = QwenImagePipeline.from_pretrained(
75
+ torch_dtype=torch.bfloat16,
76
+ device="cuda",
77
+ model_configs=[
78
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
79
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
80
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
81
+ ],
82
+ tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
83
+ )
84
+ snapshot_download("DiffSynth-Studio/Qwen-Image-In-Context-Control-Union", local_dir="models/DiffSynth-Studio/Qwen-Image-In-Context-Control-Union", allow_file_pattern="model.safetensors")
85
+ pipe.load_lora(pipe.dit, "models/DiffSynth-Studio/Qwen-Image-In-Context-Control-Union/model.safetensors")
86
+
87
+ dataset_snapshot_download(dataset_id="DiffSynth-Studio/examples_in_diffsynth", local_dir="./", allow_file_pattern=f"data/examples/qwen-image-context-control/image.jpg")
88
+ origin_image = Image.open("data/examples/qwen-image-context-control/image.jpg").resize((1024, 1024))
89
+ annotator_ids = ['openpose', 'canny', 'depth', 'lineart', 'softedge', 'normal']
90
+ for annotator_id in annotator_ids:
91
+ annotator = Annotator(processor_id=annotator_id, device="cuda")
92
+ control_image = annotator(origin_image)
93
+ control_image.save(f"{annotator.processor_id}.png")
94
+
95
+ control_prompt = "Context_Control. "
96
+ prompt = f"{control_prompt}一A beautiful girl in light blue is dancing against a dreamy starry sky with interweaving light and shadow and exquisite details."
97
+ negative_prompt = "Mesh, regular grid, blurry, low resolution, low quality, distorted, deformed, wrong anatomy, distorted hands, distorted body, distorted face, distorted hair, distorted eyes, distorted mouth"
98
+ image = pipe(prompt, seed=1, negative_prompt=negative_prompt, context_image=control_image, height=1024, width=1024)
99
+ image.save(f"image_{annotator.processor_id}.png")
100
+ ```
101
+
102
+ ---
103
+ license: apache-2.0
104
+ ---