BrianChen1129 commited on
Commit
0efd0ba
·
verified ·
1 Parent(s): be84ac4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -0
README.md ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # FastVideo FastWan2.2-TI2V-5B-Diffusers Model
6
+ <p align="center">
7
+ <img src="https://raw.githubusercontent.com/hao-ai-lab/FastVideo/main/assets/logo.jpg" width="200"/>
8
+ </p>
9
+ <div>
10
+ <div align="center">
11
+ <a href="https://github.com/hao-ai-lab/FastVideo" target="_blank">FastVideo Team</a>&emsp;
12
+ </div>
13
+
14
+ <div align="center">
15
+ <a href="https://arxiv.org/pdf/2505.13389">Paper</a> |
16
+ <a href="https://github.com/hao-ai-lab/FastVideo">Github</a>
17
+ </div>
18
+ </div>
19
+
20
+
21
+
22
+ ## Introduction
23
+ We're excited to introduce the **FastWan2.2 series**—a new line of models finetuned with our novel **Sparse-distill** strategy. This approach jointly integrates DMD and VSA in a single training process, combining the benefits of both **distillation** to shorten diffusion steps and **sparse attention** to reduce attention computations, enabling even faster video generation.
24
+
25
+ FastWan2.2-TI2V-5B-Diffusers is built upon Wan-AI/Wan2.2-TI2V-5B-Diffusers. It supports efficient **3-step inference** and produces high-quality videos at 121×704×1280 resolution. For training, we used simulated forward for the generator model, making the process data-free. The current FastWan2.2-TI2V-5B-Diffusers model is trained using only DMD.
26
+
27
+ ---
28
+
29
+ ## Model Overview
30
+
31
+ - 3-step inference is supported.
32
+ - Our model is trained on **704×704×1280** resolution, but it supports generating videos with **any resolution**.(quality may degrade)
33
+ - Finetuning and inference scripts are available in the [FastVideo](https://github.com/hao-ai-lab/FastVideo) repository:
34
+ - [1 Node/GPU debugging finetuning script](https://github.com/hao-ai-lab/FastVideo/blob/main/scripts/distill/v1_distill_dmd_wan.sh)
35
+ - [Slurm training example script](https://github.com/hao-ai-lab/FastVideo/blob/main/examples/distill/Wan2.2-TI2V-5B-Diffusers/Data-free/distill_dmd_t2v_5B.sh)
36
+ - [Inference script](https://github.com/hao-ai-lab/FastVideo/blob/main/scripts/inference/v1_inference_wan_dmd.sh)
37
+ ```python
38
+ num_gpus=1
39
+ export FASTVIDEO_ATTENTION_BACKEND=FLASH_ATTN
40
+ export MODEL_BASE=FastVideo/FastWan2.2-TI2V-5B-Diffusers
41
+ # export MODEL_BASE=hunyuanvideo-community/HunyuanVideo
42
+ # You can either use --prompt or --prompt-txt, but not both.
43
+ fastvideo generate \
44
+ --model-path $MODEL_BASE \
45
+ --sp-size $num_gpus \
46
+ --tp-size 1 \
47
+ --num-gpus $num_gpus \
48
+ --height 704 \
49
+ --width 1280 \
50
+ --num-frames 121 \
51
+ --num-inference-steps 3 \
52
+ --fps 24 \
53
+ --prompt-txt assets/prompt.txt \
54
+ --negative-prompt "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards" \
55
+ --seed 1024 \
56
+ --output-path outputs_video_dmd/ \
57
+ --dmd-denoising-steps "1000,757,522"
58
+ ```
59
+ - Try it out on **FastVideo** — we support a wide range of GPUs from **H100** to **4090**, and also support **Mac** users!
60
+
61
+ ### Training Infrastructure
62
+
63
+ Training was conducted on **8 nodes with 64 H200 GPUs** in total, using a `global batch size = 64`, and training runs for **3000 steps (~12 hours)**
64
+
65
+ If you use the FastWan2.2-TI2V-5B-Diffusers model for your research, please cite our paper:
66
+ ```
67
+ @article{zhang2025vsa,
68
+ title={VSA: Faster Video Diffusion with Trainable Sparse Attention},
69
+ author={Zhang, Peiyuan and Huang, Haofeng and Chen, Yongqi and Lin, Will and Liu, Zhengzhong and Stoica, Ion and Xing, Eric and Zhang, Hao},
70
+ journal={arXiv preprint arXiv:2505.13389},
71
+ year={2025}
72
+ }
73
+ @article{zhang2025fast,
74
+ title={Fast video generation with sliding tile attention},
75
+ author={Zhang, Peiyuan and Chen, Yongqi and Su, Runlong and Ding, Hangliang and Stoica, Ion and Liu, Zhengzhong and Zhang, Hao},
76
+ journal={arXiv preprint arXiv:2502.04507},
77
+ year={2025}
78
+ }
79
+ ```