stepfun-ai/NextStep-1-Large-Edit · Multi-GPU / Parallel Processing Support

Thank you for your attention. You can try the scripts below to enable Multi-GPU / Parallel Processing:

...
import torch.distributed as dist
dist.init_process_group(backend="nccl")
rank = dist.get_rank()

...
pipeline = NextStepPipeline(tokenizer=tokenizer, model=model).to(device=f"cuda:{rank}")

...
image = pipeline.generate_image(
    ....
    seed=42 + rank,
)[0]
image.save(f"./assets/output_{rank}.png")

then use torchrun to start the inference:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc-per-node=8 your_scripts.py