FastVideo
/

FastWan2.2-TI2V-5B-FullAttn-Diffusers

@@ -2,7 +2,7 @@
 license: apache-2.0
 ---
-# FastVideo FastWan2.2-TI2V-5B-Diffusers Model
 <p align="center">
   <img src="https://raw.githubusercontent.com/hao-ai-lab/FastVideo/main/assets/logo.png" width="200"/>
 </p>
@@ -22,7 +22,7 @@ license: apache-2.0
 ## Introduction
 We're excited to introduce the **FastWan2.2 series**—a new line of models finetuned with our novel **Sparse-distill** strategy. This approach jointly integrates DMD and VSA in a single training process, combining the benefits of both **distillation** to shorten diffusion steps and **sparse attention** to reduce attention computations, enabling even faster video generation.
-FastWan2.2-TI2V-5B-Diffusers is built upon Wan-AI/Wan2.2-TI2V-5B-Diffusers. It supports efficient **3-step inference** and produces high-quality videos at 121×704×1280 resolution. For training, we used simulated forward for the generator model, making the process data-free. **The current FastWan2.2-TI2V-5B-Diffusers model is trained using only DMD**.
 ---
@@ -37,7 +37,7 @@ FastWan2.2-TI2V-5B-Diffusers is built upon Wan-AI/Wan2.2-TI2V-5B-Diffusers. It s
 ```python
 num_gpus=1
 export FASTVIDEO_ATTENTION_BACKEND=FLASH_ATTN
-export MODEL_BASE=FastVideo/FastWan2.2-TI2V-5B-Diffusers
 # export MODEL_BASE=hunyuanvideo-community/HunyuanVideo
 # You can either use --prompt or --prompt-txt, but not both.
 fastvideo generate \
@@ -62,7 +62,7 @@ fastvideo generate \
 Training was conducted on **8 nodes with 64 H200 GPUs** in total, using a `global batch size = 64`, and training runs for **3000 steps (~12 hours)**
-If you use the FastWan2.2-TI2V-5B-Diffusers model for your research, please cite our paper:
 ```
 @article{zhang2025vsa,
   title={VSA: Faster Video Diffusion with Trainable Sparse Attention},

 license: apache-2.0
 ---
+# FastVideo FastWan2.2-TI2V-5B-Full-Diffusers Model
 <p align="center">
   <img src="https://raw.githubusercontent.com/hao-ai-lab/FastVideo/main/assets/logo.png" width="200"/>
 </p>
 ## Introduction
 We're excited to introduce the **FastWan2.2 series**—a new line of models finetuned with our novel **Sparse-distill** strategy. This approach jointly integrates DMD and VSA in a single training process, combining the benefits of both **distillation** to shorten diffusion steps and **sparse attention** to reduce attention computations, enabling even faster video generation.
+FastWan2.2-TI2V-5B-Full-Diffusers is built upon Wan-AI/Wan2.2-TI2V-5B-Diffusers. It supports efficient **3-step inference** and produces high-quality videos at 121×704×1280 resolution. For training, we used simulated forward for the generator model, making the process data-free. **The current FastWan2.2-TI2V-5B-Full-Diffusers model is trained using only DMD**.
 ---
 ```python
 num_gpus=1
 export FASTVIDEO_ATTENTION_BACKEND=FLASH_ATTN
+export MODEL_BASE=FastVideo/FastWan2.2-TI2V-5B-Full-Diffusers
 # export MODEL_BASE=hunyuanvideo-community/HunyuanVideo
 # You can either use --prompt or --prompt-txt, but not both.
 fastvideo generate \
 Training was conducted on **8 nodes with 64 H200 GPUs** in total, using a `global batch size = 64`, and training runs for **3000 steps (~12 hours)**
+If you use the FastWan2.2-TI2V-5B-Full-Diffusers model for your research, please cite our paper:
 ```
 @article{zhang2025vsa,
   title={VSA: Faster Video Diffusion with Trainable Sparse Attention},