amd
/

hecui102 commited on
Commit
9c2636b
·
verified ·
1 Parent(s): 14bd6e2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -1
README.md CHANGED
@@ -8,13 +8,54 @@ datasets:
8
  computational budgets.Hummingbird-I2V adopts a lightweight **U-Net** architecture with **0.9B parameters** and a novel two-stage training strategy guided by
9
  **reward-based feedback**, resulting in substantial improvements in inference speed, model efficiency, and visual quality. To further improve output resolution with minimal
10
  overhead, we introduce a **super-resolution** module at the end of the pipeline. Additionally, we leverage **ReNeg**, an AMD proposed reward-guided framework for learning
11
- negative embeddings via gradient descent, to further boost visual quality. As a result, Hummingbird-I2V can generate high-quality 4K video in just 11 seconds with 16
12
  inference steps on an AMD Radeon™ RX 7900 XTX GPU. Quantitative results on the VBench-I2V benchmark show that Hummingbird-I2V achieves state-of-the-art performance among
13
  U-Net-based diffusion models and competitive results compared to significantly larger DiT-based models. We provide a detailed analysis of the model architecture, training
14
  methodology, and benchmark performance.
15
 
16
  <img src="src/key_takeway.png" alt="key_takeway" title="key_takeway" class="key_takeway">
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  | Model | I2V Subj | I2V Bkg | Cam Mot | Subj Cons | Bkg Cons | Mot Smo | Dyn Deg | Aes Qual | Img Qual | Total Score |
19
  |---------------------|----------|---------|---------|-----------|-----------|----------|----------|-----------|-----------|--------------|
20
  | CogVideoSFT | 97.67% | 98.76% | 84.93% | 95.47% | 98.30% | 98.35% | 36.51% | 59.76% | 67.64% | 87.98% |
 
8
  computational budgets.Hummingbird-I2V adopts a lightweight **U-Net** architecture with **0.9B parameters** and a novel two-stage training strategy guided by
9
  **reward-based feedback**, resulting in substantial improvements in inference speed, model efficiency, and visual quality. To further improve output resolution with minimal
10
  overhead, we introduce a **super-resolution** module at the end of the pipeline. Additionally, we leverage **ReNeg**, an AMD proposed reward-guided framework for learning
11
+ negative embeddings via gradient descent, to further boost visual quality. As a result, Hummingbird-I2V can generate high-quality 4K video in just **11 seconds** with 16
12
  inference steps on an AMD Radeon™ RX 7900 XTX GPU. Quantitative results on the VBench-I2V benchmark show that Hummingbird-I2V achieves state-of-the-art performance among
13
  U-Net-based diffusion models and competitive results compared to significantly larger DiT-based models. We provide a detailed analysis of the model architecture, training
14
  methodology, and benchmark performance.
15
 
16
  <img src="src/key_takeway.png" alt="key_takeway" title="key_takeway" class="key_takeway">
17
 
18
+ <img src="src/i2v_training_pipeline.png" alt="i2v_training_pipeline" title="i2v_training_pipeline" class="i2v_training_pipeline">
19
+
20
+ <style>
21
+ table {
22
+ width: auto;
23
+ border-collapse: collapse;
24
+ }
25
+ th, td {
26
+ border: 1px solid #ddd;
27
+ text-align: center;
28
+ padding: 0px;
29
+ vertical-align: middle;
30
+ width: 256px; /* 每列宽度固定 */
31
+ }
32
+ tr.text-row {
33
+ height: 30px; /* 文字行高度 */
34
+ }
35
+ tr.image-row {
36
+ height: 160px; /* 图片行高度 */
37
+ }
38
+ /* 默认表格中的图片大小 */
39
+ img {
40
+ width: 256px;
41
+ height: 160px;
42
+ object-fit: cover;
43
+ }
44
+ /* 只影响 vbench.png */
45
+ .vbench-img {
46
+ width: 785px !important;
47
+ height: 698px !important;
48
+ object-fit: contain; /* 让图片完整显示,不裁剪 */
49
+ }
50
+ </style>
51
+
52
+
53
+
54
+
55
+
56
+
57
+
58
+
59
  | Model | I2V Subj | I2V Bkg | Cam Mot | Subj Cons | Bkg Cons | Mot Smo | Dyn Deg | Aes Qual | Img Qual | Total Score |
60
  |---------------------|----------|---------|---------|-----------|-----------|----------|----------|-----------|-----------|--------------|
61
  | CogVideoSFT | 97.67% | 98.76% | 84.93% | 95.47% | 98.30% | 98.35% | 36.51% | 59.76% | 67.64% | 87.98% |