A strange discovery
The Qwen Image model only works well for text-to-image generation. When using it for low-noise image-to-image or inpainting on images generated by other models, the results turn out especially poor.
When using it for low-noise image-to-image or inpainting on images generated by other models, the results turn out especially poor.
May I ask in which format you used Qwen2.5-VL-7b? If it's in GGUF, did you connect the mmroj file?
(Sorry for MTL, I am not native speaker)
When using it for low-noise image-to-image or inpainting on images generated by other models, the results turn out especially poor.
May I ask in which format you used Qwen2.5-VL-7b? If it's in GGUF, did you connect the mmroj file?
(Sorry for MTL, I am not native speaker)
got prompt
Using xformers attention in VAE
Using xformers attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
gguf qtypes: Q6_K (198), F32 (141)
Dequantizing token_embd.weight to prevent runtime OOM.
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load QwenImageTEModel_
loaded completely 8513.38207321167 6145.576171875 True
Requested to load WanVAE
0 models unloaded.
loaded partially 128.0 127.9998779296875 0
0 models unloaded.
loaded partially 128.0 127.9998779296875 0
gguf qtypes: F32 (1087), BF16 (6), Q5_K (28), Q4_K (580), Q6_K (232)
model weight dtype torch.bfloat16, manual cast: None
model_type FLUX
Requested to load QwenImage
loaded partially 8447.846038879394 8447.280517578125 0
Attempting to release mmap (549)
Processing interrupted
Prompt executed in 68.81 seconds
When using it for low-noise image-to-image or inpainting on images generated by other models, the results turn out especially poor.
May I ask in which format you used Qwen2.5-VL-7b? If it's in GGUF, did you connect the mmroj file?
(Sorry for MTL, I am not native speaker)
The results appear significantly better than in my earlier tests, possibly due to an upgrade to the Lightning 8steps LoRA.
The results appear significantly better than in my earlier tests, possibly due to an upgrade to the Lightning 8steps LoRA.
These are just my guesses, but the lack of mmproj (that is, the "vision" of the text encoder) can affect the quality of the output images of the model. Since there is a mention in the Qwen Image document that the image for img2img is also fed to the Qwen2.5-VL input. But then again, I've only seen one repo in which Qwen2.5-VL was quantized in GGUF without separating the "view" into a separate file, so I'm not sure if that's the case.
The results appear significantly better than in my earlier tests, possibly due to an upgrade to the Lightning 8steps LoRA.
These are just my guesses, but the lack of mmproj (that is, the "vision" of the text encoder) can affect the quality of the output images of the model. Since there is a mention in the Qwen Image document that the image for img2img is also fed to the Qwen2.5-VL input. But then again, I've only seen one repo in which Qwen2.5-VL was quantized in GGUF without separating the "view" into a separate file, so I'm not sure if that's the case.
You mentioned “mmproj,” which reminded me that I have installed a custom node, “https://github.com/judian17/ComfyUI-joycaption-beta-one-GGUF,” which requires placing the llama-joycaption-beta-one-llava-mmproj-model-f16.gguf
file in the ComfyUI\models\llava_gguf\
directory. However, I did not use it in this Qwen Image workflow.