Diffusers
Safetensors
WanDMDPipeline
BrianChen1129 commited on
Commit
06c3198
·
verified ·
1 Parent(s): cfa0832

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -5
README.md CHANGED
@@ -1,10 +1,6 @@
1
  ---
2
  license: apache-2.0
3
  ---
4
- ---
5
- license: apache-2.0
6
- ---
7
-
8
  # FastVideo FastWan2.1-T2V-14B-480P-Diffusers
9
  <p align="center">
10
  <img src="https://raw.githubusercontent.com/hao-ai-lab/FastVideo/main/assets/logo.jpg" width="200"/>
@@ -40,7 +36,7 @@ This model is jointly finetuned with [DMD](https://arxiv.org/pdf/2405.14867) and
40
  ### Training Infrastructure
41
 
42
  Training was conducted on **8 nodes with 64 H200 GPUs** in total, using a `global batch size = 64`.
43
- We enable `gradient checkpointing`, set `HSDP_shard_dim = 8`, 'sequence_parallel_size = 4', and use `learning rate = 1e-5`.
44
  We set **VSA attention sparsity** to 0.9, and training runs for **3000 steps (~52 hours)**
45
  The detailed training example script is available [here](https://github.com/hao-ai-lab/FastVideo/blob/main/examples/distill/Wan-Syn-480P/distill_dmd_VSA_t2v_14B_480P.slurm).
46
 
 
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
4
  # FastVideo FastWan2.1-T2V-14B-480P-Diffusers
5
  <p align="center">
6
  <img src="https://raw.githubusercontent.com/hao-ai-lab/FastVideo/main/assets/logo.jpg" width="200"/>
 
36
  ### Training Infrastructure
37
 
38
  Training was conducted on **8 nodes with 64 H200 GPUs** in total, using a `global batch size = 64`.
39
+ We enable `gradient checkpointing`, set `HSDP_shard_dim = 8`, `sequence_parallel_size = 4`, and use `learning rate = 1e-5`.
40
  We set **VSA attention sparsity** to 0.9, and training runs for **3000 steps (~52 hours)**
41
  The detailed training example script is available [here](https://github.com/hao-ai-lab/FastVideo/blob/main/examples/distill/Wan-Syn-480P/distill_dmd_VSA_t2v_14B_480P.slurm).
42