--- frameworks: - Pytorch tasks: - text-to-image-synthesis #model-type: ##如 gpt、phi、llama、chatglm、baichuan 等 #- gpt #domain: ##如 nlp、cv、audio、multi-modal #- nlp #language: ##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa #- cn #metrics: ##如 CIDEr、Blue、ROUGE 等 #- CIDEr #tags: ##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他 #- pretrained #tools: ##如 vllm、fastchat、llamacpp、AdaSeq 等 #- vllm base_model: - Qwen/Qwen-Image base_model_relation: adapter --- # Qwen-Image Image Structure Control Model - Depth ControlNet ![](./assets/cover.png) ## Model Introduction This model is a structure control model for images, trained based on [Qwen-Image](https://www.modelscope.cn/models/Qwen/Qwen-Image) .The model architecture is ControlNet, which can control the generated image structure according to the depth (Depth) map .The training framework is built on[DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) and the dataset used is [BLIP3o](https://modelscope.cn/datasets/BLIP3o/BLIP3o-60k)。 ## Effect Demonstration |Structure Map|Generated Image 1|Generated Image 2| |-|-|-| |![](./assets/depth2.jpg)|![](./assets/image2_0.jpg)|![](./assets/image2_1.jpg)| |![](./assets/depth3.jpg)|![](./assets/image3_0.jpg)|![](./assets/image3_1.jpg)| |![](./assets/depth1.jpg)|![](./assets/image1_0.jpg)|![](./assets/image1_1.jpg)| ## Inference Code ``` git clone https://github.com/modelscope/DiffSynth-Studio.git cd DiffSynth-Studio pip install -e . ``` ```python from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig, ControlNetInput from PIL import Image import torch from modelscope import dataset_snapshot_download pipe = QwenImagePipeline.from_pretrained( torch_dtype=torch.bfloat16, device="cuda", model_configs=[ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"), ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"), ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"), ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Depth", origin_file_pattern="model.safetensors"), ], tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"), ) dataset_snapshot_download( dataset_id="DiffSynth-Studio/example_image_dataset", local_dir="./data/example_image_dataset", allow_file_pattern="depth/image_1.jpg" ) controlnet_image = Image.open("data/example_image_dataset/depth/image_1.jpg").resize((1328, 1328)) prompt = "Exquisite portrait of an underwater girl with flowing blue dress and fluttering hair. Transparent light and shadow, surrounded by bubbles. Her face is serene, with exquisite details and dreamy beauty." image = pipe( prompt, seed=0, blockwise_controlnet_inputs=[ControlNetInput(image=controlnet_image)] ) image.save("image.jpg") ``` --- license: apache-2.0 ---