Releasing FP8 & F16 Models

#9
by TatsuyaXAI - opened

First of all, thank you for the open-source models. Qwen is bringing huge growth to the open-source development of LLMs and now image generation.

I hope in the future there will also be FP8 and F18 models launched for lower-end GPUs with only 8–16 GB of VRAM.

It would be great to have multiple models, such as one focused on realism and another on animation, similar to the fine-tuned models of SDXL and SD 1.5.

Since these large models are mostly practical for enterprises but very difficult for personal or retail users, smaller optimized versions would be a big help.

Again, thank you for the superb model.

It can be converted directly through Diffusers, right?

https://huggingface.co/Qwen/Qwen-Image-Edit/discussions/6#68a39afdf4aa9e784e43afc0

It can be converted directly through Diffusers, right?

https://huggingface.co/Qwen/Qwen-Image-Edit/discussions/6#68a39afdf4aa9e784e43afc0

In the process of finding out right now.
Will let you know.
The downloads are killing me, softly.

Why would you use FP16 instead of BF16 though ?
If you GPU doesn't support BF16, I don't think you could even run this

Wait for a FP8 scaled model from Kijai (smart scaling is way better than a naive truncated FP8)

Both the bitsandbytes code and torchao code are now functional.
They can be found here:

bitsandbytes: ~17GB VRAM
https://huggingface.co/Qwen/Qwen-Image-Edit/discussions/6#68a3f2b63a24e2df78974f5d

torchao: ~23GB VRAM
https://huggingface.co/Qwen/Qwen-Image-Edit/discussions/6#68a4013ec45c7fbadef91472

NielsGx: There's a "fast fp16_accumulation" that makes FP16 faster on some (nvidia, as far as I know) cards. Shows up as "fp16_fast" I believe, in some ComfyUI nodes. So that'd be >A< reason.

Found the reference, from the Kijai Wan 2.1 T2V workflow: "fp_16_fast enables 'Full FP16 Accumulation in FP16 GEMMs" feature available in the very latest pytorch nightly, this is around 20% speed boost. '

So that's >A< reason, if you've got vram to burn.

Sign up or log in to comment