Colab notebook inference

#2
by Jokality - opened

I've been trying to get it work on Google colab but I'm not that lucky. I just wish the community would make this more accessible by giving tutorials for using stableavatar on comfyui and colab

from google.colab import drive
drive.mount('/content/drive')
!pip uninstall xformers -y
!pip uninstall diffusers transformers torch torchvision torchaudio -y

Install PyTorch first

!pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121

Install diffusers and transformers WITHOUT xformers

!pip install diffusers==0.21.4 transformers==4.35.0 accelerate

Install other dependencies

!pip install opencv-python librosa soundfile Pillow numpy matplotlib tqdm einops omegaconf safetensors huggingface-hub audio-separator mediapipe scipy imageio[ffmpeg] moviepy

Set environment to disable xformers

!export XFORMERS_DISABLED=1

Mount drive

from google.colab import drive
drive.mount('/content/drive')

Clone fresh

!cd /content && rm -rf StableAvatar
!git clone https://github.com/Francis-Rings/StableAvatar.git
%cd StableAvatar

Download models

!pip install "huggingface_hub[cli]"
!huggingface-cli download FrancisRing/StableAvatar --local-dir ./checkpoints

%cd StableAvatar
!pip install -r requirements.txt

%cd StableAvatar

Use the official inference.sh parameters from the repository

!CUDA_VISIBLE_DEVICES=0 python inference.py
--config_path="deepspeed_config/wan2.1/wan_civitai.yaml"
--pretrained_model_name_or_path="./checkpoints/Wan2.1-Fun-V1.1-1.3B-InP"
--transformer_path="./checkpoints/StableAvatar-1.3B/transformer3d-square.pt"
--pretrained_wav2vec_path="./checkpoints/wav2vec2-base-960h"
--validation_reference_path="/content/drive/MyDrive/StableAvatar/images/person7.jpg"
--validation_driven_audio_path="/content/drive/MyDrive/StableAvatar/audio/speech2.wav"
--output_dir="/content/drive/MyDrive/StableAvatar/output_official"
--validation_prompts="A stunning anime female singer with colorful hair performing with electric guitar, passionate singing expression, futuristic tropical cyberpunk environment with neon palm trees and holographic elements, Japanese anime art style, vibrant pink and blue lighting, sci-fi paradise setting"
--width=512
--height=512
--sample_steps=50
--overlap_window_length=15
--clip_sample_n_frames=81
--motion_frame=60
--GPU_memory_mode="model_full_load"
--sample_text_guide_scale=8.0
--sample_audio_guide_scale=8.0
--seed=42

thanks for the notebook, but my runtime keeps getting out of memory before it even loads the model. from its specifics, i guess it is good enough to run on the t4 colab gpu "sequential_cpu_offload."

Sign up or log in to comment