The Flux Kontext model with FP4 transformer and T5 encoder.

Usage

pip install bitsandbytes
from diffusers import FluxKontextPipeline
import torch
pipeline = FluxKontextPipeline.from_pretrained("eramth/flux-kentext-4bit-fp4",torch_dtype=torch.float16).to("cuda")
# This allows you to generate higher resolution images without much extra VRAM usage.
pipeline.vae.enable_tiling()

You can create this quantization model yourself by

from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
from diffusers import FluxKontextPipeline,FluxTransformer2DModel
from transformers import T5EncoderModel
import torch

token = ""
repo_id = ""

quant_config = TransformersBitsAndBytesConfig(load_in_4bit=True,bnb_4bit_compute_dtype=torch.float16,bnb_4bit_quant_type="fp4")

text_encoder_2_4bit = T5EncoderModel.from_pretrained(
    "black-forest-labs/FLUX.1-Kontext-dev",
    subfolder="text_encoder_2",
    quantization_config=quant_config,
    torch_dtype=torch.float16,
    token=token
)

quant_config = DiffusersBitsAndBytesConfig(load_in_4bit=True,bnb_4bit_compute_dtype=torch.float16,bnb_4bit_quant_type="fp4")

transformer_4bit = FluxTransformer2DModel.from_pretrained(
    "black-forest-labs/FLUX.1-Kontext-dev",
    subfolder="transformer",
    quantization_config=quant_config,
    torch_dtype=torch.float16,
    token=token
)

pipe = FluxKontextPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Kontext-dev",
    transformer=transformer_4bit,
    text_encoder_2=text_encoder_2_4bit,
    torch_dtype=torch.float16,
    token=token
)

pipe.push_to_hub(repo_id,token=token)
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for eramth/flux-kontext-4bit-fp4

Finetuned
(42)
this model