Update README.md
Browse files
README.md
CHANGED
@@ -8,13 +8,54 @@ datasets:
|
|
8 |
computational budgets.Hummingbird-I2V adopts a lightweight **U-Net** architecture with **0.9B parameters** and a novel two-stage training strategy guided by
|
9 |
**reward-based feedback**, resulting in substantial improvements in inference speed, model efficiency, and visual quality. To further improve output resolution with minimal
|
10 |
overhead, we introduce a **super-resolution** module at the end of the pipeline. Additionally, we leverage **ReNeg**, an AMD proposed reward-guided framework for learning
|
11 |
-
negative embeddings via gradient descent, to further boost visual quality. As a result, Hummingbird-I2V can generate high-quality 4K video in just 11 seconds with 16
|
12 |
inference steps on an AMD Radeon™ RX 7900 XTX GPU. Quantitative results on the VBench-I2V benchmark show that Hummingbird-I2V achieves state-of-the-art performance among
|
13 |
U-Net-based diffusion models and competitive results compared to significantly larger DiT-based models. We provide a detailed analysis of the model architecture, training
|
14 |
methodology, and benchmark performance.
|
15 |
|
16 |
<img src="src/key_takeway.png" alt="key_takeway" title="key_takeway" class="key_takeway">
|
17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
| Model | I2V Subj | I2V Bkg | Cam Mot | Subj Cons | Bkg Cons | Mot Smo | Dyn Deg | Aes Qual | Img Qual | Total Score |
|
19 |
|---------------------|----------|---------|---------|-----------|-----------|----------|----------|-----------|-----------|--------------|
|
20 |
| CogVideoSFT | 97.67% | 98.76% | 84.93% | 95.47% | 98.30% | 98.35% | 36.51% | 59.76% | 67.64% | 87.98% |
|
|
|
8 |
computational budgets.Hummingbird-I2V adopts a lightweight **U-Net** architecture with **0.9B parameters** and a novel two-stage training strategy guided by
|
9 |
**reward-based feedback**, resulting in substantial improvements in inference speed, model efficiency, and visual quality. To further improve output resolution with minimal
|
10 |
overhead, we introduce a **super-resolution** module at the end of the pipeline. Additionally, we leverage **ReNeg**, an AMD proposed reward-guided framework for learning
|
11 |
+
negative embeddings via gradient descent, to further boost visual quality. As a result, Hummingbird-I2V can generate high-quality 4K video in just **11 seconds** with 16
|
12 |
inference steps on an AMD Radeon™ RX 7900 XTX GPU. Quantitative results on the VBench-I2V benchmark show that Hummingbird-I2V achieves state-of-the-art performance among
|
13 |
U-Net-based diffusion models and competitive results compared to significantly larger DiT-based models. We provide a detailed analysis of the model architecture, training
|
14 |
methodology, and benchmark performance.
|
15 |
|
16 |
<img src="src/key_takeway.png" alt="key_takeway" title="key_takeway" class="key_takeway">
|
17 |
|
18 |
+
<img src="src/i2v_training_pipeline.png" alt="i2v_training_pipeline" title="i2v_training_pipeline" class="i2v_training_pipeline">
|
19 |
+
|
20 |
+
<style>
|
21 |
+
table {
|
22 |
+
width: auto;
|
23 |
+
border-collapse: collapse;
|
24 |
+
}
|
25 |
+
th, td {
|
26 |
+
border: 1px solid #ddd;
|
27 |
+
text-align: center;
|
28 |
+
padding: 0px;
|
29 |
+
vertical-align: middle;
|
30 |
+
width: 256px; /* 每列宽度固定 */
|
31 |
+
}
|
32 |
+
tr.text-row {
|
33 |
+
height: 30px; /* 文字行高度 */
|
34 |
+
}
|
35 |
+
tr.image-row {
|
36 |
+
height: 160px; /* 图片行高度 */
|
37 |
+
}
|
38 |
+
/* 默认表格中的图片大小 */
|
39 |
+
img {
|
40 |
+
width: 256px;
|
41 |
+
height: 160px;
|
42 |
+
object-fit: cover;
|
43 |
+
}
|
44 |
+
/* 只影响 vbench.png */
|
45 |
+
.vbench-img {
|
46 |
+
width: 785px !important;
|
47 |
+
height: 698px !important;
|
48 |
+
object-fit: contain; /* 让图片完整显示,不裁剪 */
|
49 |
+
}
|
50 |
+
</style>
|
51 |
+
|
52 |
+
|
53 |
+
|
54 |
+
|
55 |
+
|
56 |
+
|
57 |
+
|
58 |
+
|
59 |
| Model | I2V Subj | I2V Bkg | Cam Mot | Subj Cons | Bkg Cons | Mot Smo | Dyn Deg | Aes Qual | Img Qual | Total Score |
|
60 |
|---------------------|----------|---------|---------|-----------|-----------|----------|----------|-----------|-----------|--------------|
|
61 |
| CogVideoSFT | 97.67% | 98.76% | 84.93% | 95.47% | 98.30% | 98.35% | 36.51% | 59.76% | 67.64% | 87.98% |
|