SahilCarterr commited on
Commit
db33f70
·
verified ·
1 Parent(s): 92f361c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -20
README.md CHANGED
@@ -5,54 +5,59 @@ tasks:
5
  - text-to-image-synthesis
6
 
7
  #model-type:
8
- ##如 gptphillamachatglmbaichuan
9
  #- gpt
10
 
11
  #domain:
12
- ##如 nlpcvaudiomulti-modal
13
  #- nlp
14
 
15
  #language:
16
- ##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
17
  #- cn
18
 
19
  #metrics:
20
- ##如 CIDEr、Blue、ROUGE
21
  #- CIDEr
22
 
23
  #tags:
24
- ##各种自定义,包括 pretrainedfine-tunedinstruction-tunedRL-tuned 等训练方法和其他
25
  #- pretrained
26
 
27
  #tools:
28
- ##如 vllmfastchatllamacppAdaSeq
29
  #- vllm
30
  base_model_relation: finetune
31
  base_model:
32
  - Qwen/Qwen-Image
33
  ---
34
- # Qwen-Image 全量蒸馏加速模型
35
 
36
  ![](./assets/title.jpg)
37
 
38
- ## 模型介绍
39
 
40
- 本模型是 [Qwen-Image](https://www.modelscope.cn/models/Qwen/Qwen-Image) 的蒸馏加速版本。原版模型需要进行 40 步推理,且需要开启 classifier-free guidance (CFG),总计需要 80 次模型前向推理。蒸馏加速模型仅需要进行 15 步推理,且无需开启 CFG,总计需要 15 次模型前向推理,**实现约 5 倍的加速**。当然,可根据需要进一步减少推理步数,但生成效果会有一定损失。
 
 
 
41
 
42
- 训练框架基于 [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) 构建,训练数据是由原模型根据 [DiffusionDB](https://www.modelscope.cn/datasets/AI-ModelScope/diffusiondb) 中随机抽取的提示词生成的 1.6 万张图,训练程序在 8 * MI308X GPU 上运行了约 1 天。
 
 
43
 
44
- ## 效果展示
45
 
46
- ||原版模型|原版模型|加速模型|
47
  |-|-|-|-|
48
- |推理步数|40|15|15|
49
- |CFG scale|4|1|1|
50
- |前向推理次数|80|15|15|
51
- |样例1|![](./assets/image_1_full.jpg)|![](./assets/image_1_original.jpg)|![](./assets/image_1_ours.jpg)|
52
- |样例2|![](./assets/image_2_full.jpg)|![](./assets/image_2_original.jpg)|![](./assets/image_2_ours.jpg)|
53
- |样例3|![](./assets/image_3_full.jpg)|![](./assets/image_3_original.jpg)|![](./assets/image_3_ours.jpg)|
54
 
55
- ## 推理代码
56
 
57
  ```shell
58
  git clone https://github.com/modelscope/DiffSynth-Studio.git
@@ -75,7 +80,7 @@ pipe = QwenImagePipeline.from_pretrained(
75
  ],
76
  tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
77
  )
78
- prompt = "精致肖像,水下少女,蓝裙飘逸,发丝轻扬,光影透澈,气泡环绕,面容恬静,细节精致,梦幻唯美。"
79
  image = pipe(prompt, seed=0, num_inference_steps=15, cfg_scale=1)
80
  image.save("image.jpg")
81
  ```
 
5
  - text-to-image-synthesis
6
 
7
  #model-type:
8
+ ## e.g., gpt, phi, llama, chatglm, baichuan, etc.
9
  #- gpt
10
 
11
  #domain:
12
+ ## e.g., nlp, cv, audio, multi-modal
13
  #- nlp
14
 
15
  #language:
16
+ ## Language code list: https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
17
  #- cn
18
 
19
  #metrics:
20
+ ## e.g., CIDEr, BLEU, ROUGE, etc.
21
  #- CIDEr
22
 
23
  #tags:
24
+ ## Various custom tags, including pretrained, fine-tuned, instruction-tuned, RL-tuned, etc.
25
  #- pretrained
26
 
27
  #tools:
28
+ ## e.g., vllm, fastchat, llamacpp, AdaSeq, etc.
29
  #- vllm
30
  base_model_relation: finetune
31
  base_model:
32
  - Qwen/Qwen-Image
33
  ---
34
+ # Qwen-Image Full Distillation Accelerated Model
35
 
36
  ![](./assets/title.jpg)
37
 
38
+ ## Model Introduction
39
 
40
+ This model is a distilled and accelerated version of [Qwen-Image](https://www.modelscope.cn/models/Qwen/Qwen-Image).
41
+ The original model requires 40 inference steps and uses classifier-free guidance (CFG), resulting in a total of 80 forward passes.
42
+ The distilled accelerated model only requires 15 inference steps and does not need CFG, resulting in only 15 forward passes — **achieving about 5× speed-up**.
43
+ Of course, the number of inference steps can be further reduced if needed, but generation quality may decrease.
44
 
45
+ The training framework is built using [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio).
46
+ The training dataset consists of 16,000 images generated by the original model using randomly sampled prompts from [DiffusionDB](https://www.modelscope.cn/datasets/AI-ModelScope/diffusiondb).
47
+ Training was conducted for about 1 day on 8 × MI308X GPUs.
48
 
49
+ ## Performance Comparison
50
 
51
+ | | Original Model | Original Model | Accelerated Model |
52
  |-|-|-|-|
53
+ | Inference Steps | 40 | 15 | 15 |
54
+ | CFG Scale | 4 | 1 | 1 |
55
+ | Forward Passes | 80 | 15 | 15 |
56
+ | Example 1 | ![](./assets/image_1_full.jpg) | ![](./assets/image_1_original.jpg) | ![](./assets/image_1_ours.jpg) |
57
+ | Example 2 | ![](./assets/image_2_full.jpg) | ![](./assets/image_2_original.jpg) | ![](./assets/image_2_ours.jpg) |
58
+ | Example 3 | ![](./assets/image_3_full.jpg) | ![](./assets/image_3_original.jpg) | ![](./assets/image_3_ours.jpg) |
59
 
60
+ ## Inference Code
61
 
62
  ```shell
63
  git clone https://github.com/modelscope/DiffSynth-Studio.git
 
80
  ],
81
  tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
82
  )
83
+ prompt = "Delicate portrait, underwater girl, flowing blue dress, hair floating, clear light and shadows, bubbles surrounding, serene face, exquisite details, dreamy and beautiful."
84
  image = pipe(prompt, seed=0, num_inference_steps=15, cfg_scale=1)
85
  image.save("image.jpg")
86
  ```