Update README.md
Browse files
README.md
CHANGED
@@ -5,69 +5,68 @@ datasets:
|
|
5 |
- TempoFunk/webvid-10M
|
6 |
---
|
7 |
# AMD Hummingbird image-to-video Model
|
8 |
-
⚡️ In this work, we present **AMD Hummingbird-I2V**, a compact and efficient **diffusion-based** I2V model designed for high-quality video synthesis under limited
|
9 |
-
|
10 |
-
**
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
U-Net-based diffusion models and competitive results compared to significantly larger DiT-based models. We provide a detailed analysis of the model architecture, training
|
15 |
-
methodology, and benchmark performance.
|
16 |
|
17 |
-
<
|
|
|
|
|
18 |
|
19 |
-
<
|
20 |
-
<
|
21 |
-
<
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
<
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
<
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
</
|
|
|
|
|
39 |
|
40 |
<style>
|
41 |
table {
|
42 |
width: auto;
|
43 |
border-collapse: collapse;
|
|
|
44 |
}
|
45 |
th, td {
|
46 |
border: 1px solid #ddd;
|
47 |
text-align: center;
|
48 |
-
padding:
|
49 |
vertical-align: middle;
|
50 |
-
width: 256px;
|
51 |
}
|
52 |
-
tr.text-row {
|
53 |
-
height: 30px; /* 文字行高度 */
|
54 |
-
}
|
55 |
-
tr.image-row {
|
56 |
-
height: 160px; /* 图片行高度 */
|
57 |
-
}
|
58 |
-
/* 默认表格中的图片大小 */
|
59 |
img {
|
60 |
width: 384px;
|
61 |
height: 240px;
|
62 |
object-fit: cover;
|
|
|
|
|
|
|
63 |
}
|
64 |
-
/* 只影响 vbench.png */
|
65 |
.i2v_training_pipeline {
|
66 |
-
width:
|
67 |
-
|
68 |
-
|
|
|
|
|
69 |
}
|
70 |
</style>
|
71 |
|
72 |
|
73 |
-
|
|
|
5 |
- TempoFunk/webvid-10M
|
6 |
---
|
7 |
# AMD Hummingbird image-to-video Model
|
8 |
+
⚡️ In this work, we present **AMD Hummingbird-I2V**, a compact and efficient **diffusion-based** I2V model designed for high-quality video synthesis under limited computational budgets.
|
9 |
+
Hummingbird-I2V adopts a lightweight **U-Net** architecture with **0.9B parameters** and a novel two-stage training strategy guided by **reward-based feedback**, resulting in substantial improvements in inference speed, model efficiency, and visual quality.
|
10 |
+
To further improve output resolution with minimal overhead, we introduce a **super-resolution** module at the end of the pipeline. Additionally, we leverage **ReNeg**, an AMD proposed reward-guided framework for learning negative embeddings via gradient descent, to further boost visual quality.
|
11 |
+
As a result, Hummingbird-I2V can generate high-quality 4K video in just **11 seconds** with 16 inference steps on an AMD Radeon™ RX 7900 XTX GPU.
|
12 |
+
Quantitative results on the VBench-I2V benchmark show that Hummingbird-I2V achieves state-of-the-art performance among U-Net-based diffusion models and competitive results compared to significantly larger DiT-based models.
|
13 |
+
We provide a detailed analysis of the model architecture, training methodology, and benchmark performance.
|
|
|
|
|
14 |
|
15 |
+
<div style="margin: 0; padding: 0;">
|
16 |
+
<img src="src/i2v_training_pipeline.png" alt="i2v_training_pipeline" title="i2v_training_pipeline" class="i2v_training_pipeline">
|
17 |
+
</div>
|
18 |
|
19 |
+
<div style="margin: 0; padding: 0;">
|
20 |
+
<table>
|
21 |
+
<tr>
|
22 |
+
<td><img src="src/01.gif"></td>
|
23 |
+
<td><img src="src/02.gif"></td>
|
24 |
+
<td><img src="src/03.gif"></td>
|
25 |
+
<td><img src="src/04.gif"></td>
|
26 |
+
</tr>
|
27 |
+
<tr>
|
28 |
+
<td><img src="src/05.gif"></td>
|
29 |
+
<td><img src="src/06.gif"></td>
|
30 |
+
<td><img src="src/07.gif"></td>
|
31 |
+
<td><img src="src/08.gif"></td>
|
32 |
+
</tr>
|
33 |
+
<tr>
|
34 |
+
<td><img src="src/09.gif"></td>
|
35 |
+
<td><img src="src/10.gif"></td>
|
36 |
+
<td><img src="src/11.gif"></td>
|
37 |
+
<td><img src="src/12.gif"></td>
|
38 |
+
</tr>
|
39 |
+
</table>
|
40 |
+
</div>
|
41 |
|
42 |
<style>
|
43 |
table {
|
44 |
width: auto;
|
45 |
border-collapse: collapse;
|
46 |
+
margin: 0 auto;
|
47 |
}
|
48 |
th, td {
|
49 |
border: 1px solid #ddd;
|
50 |
text-align: center;
|
51 |
+
padding: 0;
|
52 |
vertical-align: middle;
|
53 |
+
width: 256px;
|
54 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
img {
|
56 |
width: 384px;
|
57 |
height: 240px;
|
58 |
object-fit: cover;
|
59 |
+
margin: 0 !important;
|
60 |
+
padding: 0 !important;
|
61 |
+
display: block;
|
62 |
}
|
|
|
63 |
.i2v_training_pipeline {
|
64 |
+
width: 100%;
|
65 |
+
max-width: 1200px;
|
66 |
+
height: auto;
|
67 |
+
object-fit: contain;
|
68 |
+
margin: 0 auto;
|
69 |
}
|
70 |
</style>
|
71 |
|
72 |
|
|