Multi-GPU / Parallel Processing Support

#3
by Iotcv - opened

We are trying to use this model on multiple GPUs, but noticed that it currently only utilizes a single GPU. This leads to out-of-memory (OOM) errors.
Any guidance on best practices for running this model across multiple GPUs would be very helpful.
Looking forward to exploring more with this model
Thanks

Thank you for your attention. You can try the scripts below to enable Multi-GPU / Parallel Processing:

...
import torch.distributed as dist
dist.init_process_group(backend="nccl")
rank = dist.get_rank()

...
pipeline = NextStepPipeline(tokenizer=tokenizer, model=model).to(device=f"cuda:{rank}")

...
image = pipeline.generate_image(
    ....
    seed=42 + rank,
)[0]
image.save(f"./assets/output_{rank}.png")

then use torchrun to start the inference:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc-per-node=8 your_scripts.py

Sign up or log in to comment