FastVideo
/

FastWan2.1-T2V-14B-Diffusers

Model card Files Files and versions

PY007 commited on Jul 30

Commit

9f5bf86

·

verified ·

1 Parent(s): d6758c9

Update README.md

Files changed (1) hide show

README.md +28 -2

README.md CHANGED Viewed

@@ -32,11 +32,37 @@ FastWan2.1-T2V-14B-480P-Diffuserss is built upon Wan-AI/Wan2.1-T2V-14B-Diffusers
 ## Model Overview
 - 3-step inference is supported and achieves up to **50x speed up** for denoising loop on a single **H100** GPU.
-- Our model is trained on **61×448×832** resolution, but it supports generating videos with any resolution.(quality may degrade)
 - Finetuning and inference scripts are available in the [FastVideo](https://github.com/hao-ai-lab/FastVideo) repository:
   - [1 Node/GPU debugging finetuning script](https://github.com/hao-ai-lab/FastVideo/blob/main/scripts/distill/v1_distill_dmd_wan_VSA.sh)
   - [Slurm training example script](https://github.com/hao-ai-lab/FastVideo/blob/main/examples/distill/Wan2.1-T2V/Wan-Syn-Data-480P/distill_dmd_VSA_t2v_14B.slurm)
-  - [Inference script](https://github.com/hao-ai-lab/FastVideo/blob/main/scripts/inference/v1_inference_wan_dmd.sh)
 - Try it out on **FastVideo** — we support a wide range of GPUs from **H100** to **4090**, and also support **Mac** users!
 ### Training Infrastructure

 ## Model Overview
 - 3-step inference is supported and achieves up to **50x speed up** for denoising loop on a single **H100** GPU.
+- Our model is trained on **61×448×832** resolution, but it supports generating videos with any resolution.(480P, 720P, quality may degrade)
 - Finetuning and inference scripts are available in the [FastVideo](https://github.com/hao-ai-lab/FastVideo) repository:
   - [1 Node/GPU debugging finetuning script](https://github.com/hao-ai-lab/FastVideo/blob/main/scripts/distill/v1_distill_dmd_wan_VSA.sh)
   - [Slurm training example script](https://github.com/hao-ai-lab/FastVideo/blob/main/examples/distill/Wan2.1-T2V/Wan-Syn-Data-480P/distill_dmd_VSA_t2v_14B.slurm)
+  - Inference script in FastVideo:
+```python
+#!/bin/bash
+num_gpus=1
+export FASTVIDEO_ATTENTION_BACKEND=VIDEO_SPARSE_ATTN
+export MODEL_BASE=FastVideo/FastWan2.1-T2V-14B-480P-Diffusers
+# export MODEL_BASE=hunyuanvideo-community/HunyuanVideo
+# You can either use --prompt or --prompt-txt, but not both.
+fastvideo generate \
+    --model-path $MODEL_BASE \
+    --sp-size $num_gpus \
+    --tp-size 1 \
+    --num-gpus $num_gpus \
+    --height 720 \
+    --width 1280 \
+    --num-frames 81 \
+    --num-inference-steps 3 \
+    --fps 16 \
+    --prompt-txt assets/prompt.txt \
+    --negative-prompt "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards" \
+    --seed 1024 \
+    --output-path outputs_video_dmd/ \
+    --VSA-sparsity 0.9 \
+    --dmd-denoising-steps "1000,757,522"
+```
 - Try it out on **FastVideo** — we support a wide range of GPUs from **H100** to **4090**, and also support **Mac** users!
 ### Training Infrastructure