|
--- |
|
frameworks: |
|
- Pytorch |
|
tasks: |
|
- text-to-image-synthesis |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
base_model_relation: finetune |
|
base_model: |
|
- Qwen/Qwen-Image |
|
--- |
|
# Qwen-Image Full Distillation Accelerated Model |
|
|
|
 |
|
|
|
## Model Introduction |
|
|
|
This model is a distilled and accelerated version of [Qwen-Image](https://www.modelscope.cn/models/Qwen/Qwen-Image). |
|
The original model requires 40 inference steps and uses classifier-free guidance (CFG), resulting in a total of 80 forward passes. |
|
The distilled accelerated model only requires 15 inference steps and does not need CFG, resulting in only 15 forward passes — **achieving about 5× speed-up**. |
|
Of course, the number of inference steps can be further reduced if needed, but generation quality may decrease. |
|
|
|
The training framework is built using [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio). |
|
The training dataset consists of 16,000 images generated by the original model using randomly sampled prompts from [DiffusionDB](https://www.modelscope.cn/datasets/AI-ModelScope/diffusiondb). |
|
Training was conducted for about 1 day on 8 × MI308X GPUs. |
|
|
|
## Performance Comparison |
|
|
|
| | Original Model | Original Model | Accelerated Model | |
|
|-|-|-|-| |
|
| Inference Steps | 40 | 15 | 15 | |
|
| CFG Scale | 4 | 1 | 1 | |
|
| Forward Passes | 80 | 15 | 15 | |
|
| Example 1 |  |  |  | |
|
| Example 2 |  |  |  | |
|
| Example 3 |  |  |  | |
|
|
|
## Inference Code |
|
|
|
```shell |
|
git clone https://github.com/modelscope/DiffSynth-Studio.git |
|
cd DiffSynth-Studio |
|
pip install -e . |
|
``` |
|
|
|
```python |
|
from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig |
|
import torch |
|
|
|
|
|
pipe = QwenImagePipeline.from_pretrained( |
|
torch_dtype=torch.bfloat16, |
|
device="cuda", |
|
model_configs=[ |
|
ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Distill-Full", origin_file_pattern="diffusion_pytorch_model*.safetensors"), |
|
ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"), |
|
ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"), |
|
], |
|
tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"), |
|
) |
|
prompt = "Delicate portrait, underwater girl, flowing blue dress, hair floating, clear light and shadows, bubbles surrounding, serene face, exquisite details, dreamy and beautiful." |
|
image = pipe(prompt, seed=0, num_inference_steps=15, cfg_scale=1) |
|
image.save("image.jpg") |
|
``` |
|
|
|
--- |
|
license: apache-2.0 |
|
--- |
|
|