Colab notebook inference
I've been trying to get it work on Google colab but I'm not that lucky. I just wish the community would make this more accessible by giving tutorials for using stableavatar on comfyui and colab
from google.colab import drive
drive.mount('/content/drive')
!pip uninstall xformers -y
!pip uninstall diffusers transformers torch torchvision torchaudio -y
Install PyTorch first
!pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121
Install diffusers and transformers WITHOUT xformers
!pip install diffusers==0.21.4 transformers==4.35.0 accelerate
Install other dependencies
!pip install opencv-python librosa soundfile Pillow numpy matplotlib tqdm einops omegaconf safetensors huggingface-hub audio-separator mediapipe scipy imageio[ffmpeg] moviepy
Set environment to disable xformers
!export XFORMERS_DISABLED=1
Mount drive
from google.colab import drive
drive.mount('/content/drive')
Clone fresh
!cd /content && rm -rf StableAvatar
!git clone https://github.com/Francis-Rings/StableAvatar.git
%cd StableAvatar
Download models
!pip install "huggingface_hub[cli]"
!huggingface-cli download FrancisRing/StableAvatar --local-dir ./checkpoints
%cd StableAvatar
!pip install -r requirements.txt
%cd StableAvatar
Use the official inference.sh parameters from the repository
!CUDA_VISIBLE_DEVICES=0 python inference.py
--config_path="deepspeed_config/wan2.1/wan_civitai.yaml"
--pretrained_model_name_or_path="./checkpoints/Wan2.1-Fun-V1.1-1.3B-InP"
--transformer_path="./checkpoints/StableAvatar-1.3B/transformer3d-square.pt"
--pretrained_wav2vec_path="./checkpoints/wav2vec2-base-960h"
--validation_reference_path="/content/drive/MyDrive/StableAvatar/images/person7.jpg"
--validation_driven_audio_path="/content/drive/MyDrive/StableAvatar/audio/speech2.wav"
--output_dir="/content/drive/MyDrive/StableAvatar/output_official"
--validation_prompts="A stunning anime female singer with colorful hair performing with electric guitar, passionate singing expression, futuristic tropical cyberpunk environment with neon palm trees and holographic elements, Japanese anime art style, vibrant pink and blue lighting, sci-fi paradise setting"
--width=512
--height=512
--sample_steps=50
--overlap_window_length=15
--clip_sample_n_frames=81
--motion_frame=60
--GPU_memory_mode="model_full_load"
--sample_text_guide_scale=8.0
--sample_audio_guide_scale=8.0
--seed=42
thanks for the notebook, but my runtime keeps getting out of memory before it even loads the model. from its specifics, i guess it is good enough to run on the t4 colab gpu "sequential_cpu_offload."