File size: 10,654 Bytes
3cdedf0
 
 
 
 
 
 
 
 
 
 
 
 
 
5fad88a
3cdedf0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5fad88a
3cdedf0
5fad88a
3cdedf0
5fad88a
 
3cdedf0
5fad88a
3cdedf0
 
5fad88a
3cdedf0
5fad88a
3cdedf0
5fad88a
3cdedf0
5fad88a
 
 
3cdedf0
 
5fad88a
 
3cdedf0
5fad88a
3cdedf0
5fad88a
3cdedf0
5fad88a
3cdedf0
5fad88a
3cdedf0
5fad88a
 
 
 
 
3cdedf0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
license: apache-2.0
language:
- en
- zh
library_name: diffusers
pipeline_tag: image-to-image
---
<p align="center">
    <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/qwen_image_edit_logo.png" width="400"/>
<p>
<p align="center">
          💜 <a href="https://chat.qwen.ai/"><b>Qwen Chat</b></a>&nbsp&nbsp | &nbsp&nbsp🤗 <a href="https://huggingface.co/Qwen/Qwen-Image-Edit">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/models/Qwen/Qwen-Image-Edit">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/Qwen_Image.pdf">Tech Report</a> &nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://qwenlm.github.io/blog/qwen-image-edit/">Blog</a> &nbsp&nbsp 
<br>
🖥️ <a href="https://huggingface.co/spaces/Qwen/Qwen-Image-Edit">Demo</a>&nbsp&nbsp | &nbsp&nbsp💬 <a href="https://github.com/QwenLM/Qwen-Image/blob/main/assets/wechat.png">WeChat (微信)</a>&nbsp&nbsp | &nbsp&nbsp🫨 <a href="https://discord.gg/CV4E9rpNSD">Discord</a>&nbsp&nbsp| &nbsp&nbsp <a href="https://github.com/QwenLM/Qwen-Image">Github</a>&nbsp&nbsp
</p>

<p align="center">
    <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit_homepage.jpg" width="1600"/>
<p>


# Introduction
We are excited to introduce Qwen-Image-Edit, the image editing version of Qwen-Image. Built upon the 20B-parameter Qwen-Image model, Qwen-Image-Edit successfully extends Qwen-Image’s unique text rendering capabilities to image editing tasks, enabling precise text editing. Furthermore, Qwen-Image-Edit simultaneously feeds the input image into Qwen2.5-VL (for visual-semantic control) and the VAE Encoder (for visual-appearance control), achieving dual capabilities in both semantic and appearance editing. To experience the latest model, visit [Qwen Chat](https://qwen.ai)  and select the "Image Editing" feature.

Key Features:

* **Dual Semantic/Appearance Editing**: Qwen-Image-Edit supports both low-level visual appearance editing (e.g., adding, removing, or modifying elements while keeping certain regions of the image unchanged) and high-level visual semantic editing (e.g., character creation, object rotation, style transfer—where overall pixel values may change, but semantic consistency is preserved).
* **Precise Text Editing**: Qwen-Image-Edit supports bilingual (Chinese and English) text editing, allowing direct modification—addition, deletion, or alteration—of text within images while preserving original font, size, and style.
* **Strong Cross-Benchmark Performance**: Evaluations across multiple public benchmarks show that Qwen-Image-Edit achieves state-of-the-art (SOTA) results in image editing tasks, establishing itself as a powerful foundation model for image generation.



## Quick Start

Install the latest version of diffusers
```
pip install git+https://github.com/huggingface/diffusers
```

The following contains a code snippet illustrating how to use the model to generate images based on text prompts:

```python
import os
from PIL import Image
import torch

from diffusers import QwenImageEditPipeline

pipeline = QwenImageEditPipeline.from_pretrained("Qwen/Qwen-Image-Edit")
print("pipeline loaded")
pipeline.to(torch.bfloat16)
pipeline.to("cuda")
pipeline.set_progress_bar_config(disable=None)
image = Image.open("./input.png").convert("RGB")
prompt = "Change the rabbit's color to purple, with a flash light background."
inputs = {
    "image": image,
    "prompt": prompt,
    "generator": torch.manual_seed(0),
    "true_cfg_scale": 4.0,
    "negative_prompt": " ",
    "num_inference_steps": 50,
}

with torch.inference_mode():
    output = pipeline(**inputs)
    output_image = output.images[0]
    output_image.save("output_image_edit.png")
    print("image saved at", os.path.abspath("output_image_edit.png"))

```

## Showcase
One of Qwen-Image-Edit’s standout capabilities is dual semantic and appearance editing. Semantic editing refers to modifying an image while preserving its original visual semantics. For instance, let’s start with Qwen’s mascot—Capibara:
![Capibara](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit_en/幻灯片3.JPG#center)
Although every pixel in the edited image differs from the input (the leftmost image), the character identity of Capibara remains consistent. This semantic editing capability enables effortless creation and modification of original IPs. For example, using a series of prompts, we expanded the set to create a full MBTI meme series:
![MBTI meme series](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit_en/幻灯片4.JPG#center)
Another key application of semantic editing is viewpoint transformation. As shown below, Qwen-Image-Edit can not only rotate objects by 90 degrees but even by 180 degrees, revealing the back of an object:
![Viewpoint transformation 90 degrees](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit_en/幻灯片12.JPG#center)
![Viewpoint transformation 180 degrees](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit_en/幻灯片13.JPG#center)
Another example of semantic editing is style transfer. Given a portrait, Qwen-Image-Edit can easily transform it into various styles such as Studio Ghibli, which is particularly useful for creating avatars or character IDs:
![Style transfer](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit_en/幻灯片1.JPG#center)
In addition to semantic editing, appearance editing addresses a different class of editing needs. Appearance editing requires certain regions of the image to remain completely unchanged. A common example is addition, deletion, or modification.
Below, we demonstrate adding a signboard to an image. Notably, Qwen-Image-Edit not only adds the signboard but also generates a corresponding reflection:
![Adding a signboard](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit_en/幻灯片6.JPG#center)
Here’s another interesting example—removing fine strands of hair:
![Removing fine strands of hair](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit_en/幻灯片7.JPG#center)
Below shows how to modify the color of text in an image—changing the color of the letter "n" to blue:
![Modifying text color](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit_en/幻灯片8.JPG#center)
Appearance editing is also crucial in modifying human poses, backgrounds, and clothing, as demonstrated in the following three images:
![Modifying human poses](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit_en/幻灯片9.JPG#center)
![Modifying backgrounds](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit_en/幻灯片11.JPG#center)
![Modifying clothing](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit_en/幻灯片5.JPG#center)
The second hallmark of Qwen-Image-Edit is its accurate text editing, made possible by Qwen-Image’s powerful text rendering capabilities.
For example, the following two images demonstrate Qwen-Image-Edit’s ability in editing English text:
![Editing English text 1](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit_en/幻灯片15.JPG#center)
![Editing English text 2](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit_en/幻灯片16.JPG#center)
Qwen-Image-Edit can also edit Chinese posters—modifying both large and small text elements:
![Editing Chinese posters](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit_en/幻灯片17.JPG#center)
Finally, let’s walk through a concrete example showing how sequential editing can correct errors in a calligraphy artwork originally generated by Qwen-Image:
![Calligraphy artwork](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit_en/幻灯片18.JPG#center)
This artwork contains several incorrect characters. We can progressively correct them using Qwen-Image-Edit. For instance, we can add bounding boxes directly on the original image and instruct Qwen-Image-Edit to fix the highlighted parts—here, correcting “稽” within the red box and “亭” within the blue box:
![Correcting characters](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit_en/幻灯片19.JPG#center)
Unfortunately, the character “稽” is uncommon, and the model initially fails to correct it—the lower-right component should be “旨”, not “日”. We can further highlight the incorrect “日” with a red box and prompt Qwen-Image-Edit to fine-tune that region into “旨”:
![Fine-tuning character](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit_en/幻灯片20.JPG#center)
Amazing, right? Following this iterative approach, we can progressively correct all errors until reaching the final version:
![Final version 1](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit_en/幻灯片21.JPG#center)
![Final version 2](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit_en/幻灯片22.JPG#center)
![Final version 3](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit_en/幻灯片23.JPG#center)
![Final version 4](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit_en/幻灯片24.JPG#center)
![Final version 5](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/edit_en/幻灯片25.JPG#center)
Ultimately, we obtain a fully correct calligraphy version of Lantingji Xu (Preface to the Poems Composed at the Orchid Pavilion)!
In summary, we hope Qwen-Image-Edit will further advance the field of image generation, significantly lower the technical barriers to visual content creation, and inspire even more innovative applications.


## License Agreement

Qwen-Image is licensed under Apache 2.0. 

## Citation

We kindly encourage citation of our work if you find it useful.

```bibtex
@misc{wu2025qwenimagetechnicalreport,
      title={Qwen-Image Technical Report}, 
      author={Chenfei Wu and Jiahao Li and Jingren Zhou and Junyang Lin and Kaiyuan Gao and Kun Yan and Sheng-ming Yin and Shuai Bai and Xiao Xu and Yilei Chen and Yuxiang Chen and Zecheng Tang and Zekai Zhang and Zhengyi Wang and An Yang and Bowen Yu and Chen Cheng and Dayiheng Liu and Deqing Li and Hang Zhang and Hao Meng and Hu Wei and Jingyuan Ni and Kai Chen and Kuan Cao and Liang Peng and Lin Qu and Minggang Wu and Peng Wang and Shuting Yu and Tingkun Wen and Wensen Feng and Xiaoxiao Xu and Yi Wang and Yichang Zhang and Yongqiang Zhu and Yujia Wu and Yuxuan Cai and Zenan Liu},
      year={2025},
      eprint={2508.02324},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2508.02324}, 
}
```