Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | 
         @@ -34,7 +34,8 @@ This model is jointly finetuned with [DMD](https://arxiv.org/pdf/2405.14867) and 
     | 
|
| 34 | 
         
             
            - 3-step inference is supported and achieves up to **20 FPS** on a single **H100** GPU.
         
     | 
| 35 | 
         
             
            - Our model is trained on **61×448×832** resolution, but it supports generating videos with any resolution.(quality may degrade)
         
     | 
| 36 | 
         
             
            - Finetuning and inference scripts are available in the [FastVideo](https://github.com/hao-ai-lab/FastVideo) repository:  
         
     | 
| 37 | 
         
            -
              - [ 
     | 
| 
         | 
|
| 38 | 
         
             
              - [Inference script](https://github.com/hao-ai-lab/FastVideo/blob/main/scripts/inference/v1_inference_wan_dmd.sh)
         
     | 
| 39 | 
         
             
            - Try it out on **FastVideo** — we support a wide range of GPUs from **H100** to **4090**, and also support **Mac** users!
         
     | 
| 40 | 
         | 
| 
         @@ -43,9 +44,6 @@ This model is jointly finetuned with [DMD](https://arxiv.org/pdf/2405.14867) and 
     | 
|
| 43 | 
         
             
            Training was conducted on **4 nodes with 32 H200 GPUs** in total, using a `global batch size = 64`.  
         
     | 
| 44 | 
         
             
            We enable `gradient checkpointing`, set `gradient_accumulation_steps=2`, and use `learning rate = 1e-5`.  
         
     | 
| 45 | 
         
             
            We set **VSA attention sparsity** to 0.8, and training runs for **4000 steps (~12 hours)**   
         
     | 
| 46 | 
         
            -
            The detailed **training example script** is available [here](https://github.com/hao-ai-lab/FastVideo/blob/main/examples/distill/Wan-Syn-480P/distill_dmd_VSA_t2v.slurm).
         
     | 
| 47 | 
         
            -
             
     | 
| 48 | 
         
            -
             
     | 
| 49 | 
         | 
| 50 | 
         
             
            If you use the FastWan2.1-T2V-1.3B-Diffusers model for your research, please cite our paper:
         
     | 
| 51 | 
         
             
            ```
         
     | 
| 
         | 
|
| 34 | 
         
             
            - 3-step inference is supported and achieves up to **20 FPS** on a single **H100** GPU.
         
     | 
| 35 | 
         
             
            - Our model is trained on **61×448×832** resolution, but it supports generating videos with any resolution.(quality may degrade)
         
     | 
| 36 | 
         
             
            - Finetuning and inference scripts are available in the [FastVideo](https://github.com/hao-ai-lab/FastVideo) repository:  
         
     | 
| 37 | 
         
            +
              - [1 Node/GPU debugging finetuning script](https://github.com/hao-ai-lab/FastVideo/blob/main/scripts/distill/v1_distill_dmd_wan_VSA.sh)
         
     | 
| 38 | 
         
            +
              - [Slurm training example script](https://github.com/hao-ai-lab/FastVideo/blob/main/examples/distill/Wan-Syn-480P/distill_dmd_VSA_t2v.slurm)  
         
     | 
| 39 | 
         
             
              - [Inference script](https://github.com/hao-ai-lab/FastVideo/blob/main/scripts/inference/v1_inference_wan_dmd.sh)
         
     | 
| 40 | 
         
             
            - Try it out on **FastVideo** — we support a wide range of GPUs from **H100** to **4090**, and also support **Mac** users!
         
     | 
| 41 | 
         | 
| 
         | 
|
| 44 | 
         
             
            Training was conducted on **4 nodes with 32 H200 GPUs** in total, using a `global batch size = 64`.  
         
     | 
| 45 | 
         
             
            We enable `gradient checkpointing`, set `gradient_accumulation_steps=2`, and use `learning rate = 1e-5`.  
         
     | 
| 46 | 
         
             
            We set **VSA attention sparsity** to 0.8, and training runs for **4000 steps (~12 hours)**   
         
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 47 | 
         | 
| 48 | 
         
             
            If you use the FastWan2.1-T2V-1.3B-Diffusers model for your research, please cite our paper:
         
     | 
| 49 | 
         
             
            ```
         
     |