amd
/

hecui102 commited on
Commit
7f06aca
·
verified ·
1 Parent(s): cedb93a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -43
README.md CHANGED
@@ -5,69 +5,68 @@ datasets:
5
  - TempoFunk/webvid-10M
6
  ---
7
  # AMD Hummingbird image-to-video Model
8
- ⚡️ In this work, we present **AMD Hummingbird-I2V**, a compact and efficient **diffusion-based** I2V model designed for high-quality video synthesis under limited
9
- computational budgets.Hummingbird-I2V adopts a lightweight **U-Net** architecture with **0.9B parameters** and a novel two-stage training strategy guided by
10
- **reward-based feedback**, resulting in substantial improvements in inference speed, model efficiency, and visual quality. To further improve output resolution with minimal
11
- overhead, we introduce a **super-resolution** module at the end of the pipeline. Additionally, we leverage **ReNeg**, an AMD proposed reward-guided framework for learning
12
- negative embeddings via gradient descent, to further boost visual quality. As a result, Hummingbird-I2V can generate high-quality 4K video in just **11 seconds** with 16
13
- inference steps on an AMD Radeon™ RX 7900 XTX GPU. Quantitative results on the VBench-I2V benchmark show that Hummingbird-I2V achieves state-of-the-art performance among
14
- U-Net-based diffusion models and competitive results compared to significantly larger DiT-based models. We provide a detailed analysis of the model architecture, training
15
- methodology, and benchmark performance.
16
 
17
- <img src="src/i2v_training_pipeline.png" alt="i2v_training_pipeline" title="i2v_training_pipeline" class="i2v_training_pipeline">
 
 
18
 
19
- <table>
20
- <tr>
21
- <td><img src="src/01.gif"></td>
22
- <td><img src="src/02.gif"></td>
23
- <td><img src="src/03.gif"></td>
24
- <td><img src="src/04.gif"></td>
25
- </tr>
26
- <tr>
27
- <td><img src="src/05.gif"></td>
28
- <td><img src="src/06.gif"></td>
29
- <td><img src="src/07.gif"></td>
30
- <td><img src="src/08.gif"></td>
31
- </tr>
32
- <tr>
33
- <td><img src="src/09.gif"></td>
34
- <td><img src="src/10.gif"></td>
35
- <td><img src="src/11.gif"></td>
36
- <td><img src="src/12.gif"></td>
37
- </tr>
38
- </table>
 
 
39
 
40
  <style>
41
  table {
42
  width: auto;
43
  border-collapse: collapse;
 
44
  }
45
  th, td {
46
  border: 1px solid #ddd;
47
  text-align: center;
48
- padding: 0px;
49
  vertical-align: middle;
50
- width: 256px; /* 每列宽度固定 */
51
  }
52
- tr.text-row {
53
- height: 30px; /* 文字行高度 */
54
- }
55
- tr.image-row {
56
- height: 160px; /* 图片行高度 */
57
- }
58
- /* 默认表格中的图片大小 */
59
  img {
60
  width: 384px;
61
  height: 240px;
62
  object-fit: cover;
 
 
 
63
  }
64
- /* 只影响 vbench.png */
65
  .i2v_training_pipeline {
66
- width: 1200px !important;
67
- height: 900px !important;
68
- object-fit: contain; /* 让图片完整显示,不裁剪 */
 
 
69
  }
70
  </style>
71
 
72
 
73
-
 
5
  - TempoFunk/webvid-10M
6
  ---
7
  # AMD Hummingbird image-to-video Model
8
+ ⚡️ In this work, we present **AMD Hummingbird-I2V**, a compact and efficient **diffusion-based** I2V model designed for high-quality video synthesis under limited computational budgets.
9
+ Hummingbird-I2V adopts a lightweight **U-Net** architecture with **0.9B parameters** and a novel two-stage training strategy guided by **reward-based feedback**, resulting in substantial improvements in inference speed, model efficiency, and visual quality.
10
+ To further improve output resolution with minimal overhead, we introduce a **super-resolution** module at the end of the pipeline. Additionally, we leverage **ReNeg**, an AMD proposed reward-guided framework for learning negative embeddings via gradient descent, to further boost visual quality.
11
+ As a result, Hummingbird-I2V can generate high-quality 4K video in just **11 seconds** with 16 inference steps on an AMD Radeon™ RX 7900 XTX GPU.
12
+ Quantitative results on the VBench-I2V benchmark show that Hummingbird-I2V achieves state-of-the-art performance among U-Net-based diffusion models and competitive results compared to significantly larger DiT-based models.
13
+ We provide a detailed analysis of the model architecture, training methodology, and benchmark performance.
 
 
14
 
15
+ <div style="margin: 0; padding: 0;">
16
+ <img src="src/i2v_training_pipeline.png" alt="i2v_training_pipeline" title="i2v_training_pipeline" class="i2v_training_pipeline">
17
+ </div>
18
 
19
+ <div style="margin: 0; padding: 0;">
20
+ <table>
21
+ <tr>
22
+ <td><img src="src/01.gif"></td>
23
+ <td><img src="src/02.gif"></td>
24
+ <td><img src="src/03.gif"></td>
25
+ <td><img src="src/04.gif"></td>
26
+ </tr>
27
+ <tr>
28
+ <td><img src="src/05.gif"></td>
29
+ <td><img src="src/06.gif"></td>
30
+ <td><img src="src/07.gif"></td>
31
+ <td><img src="src/08.gif"></td>
32
+ </tr>
33
+ <tr>
34
+ <td><img src="src/09.gif"></td>
35
+ <td><img src="src/10.gif"></td>
36
+ <td><img src="src/11.gif"></td>
37
+ <td><img src="src/12.gif"></td>
38
+ </tr>
39
+ </table>
40
+ </div>
41
 
42
  <style>
43
  table {
44
  width: auto;
45
  border-collapse: collapse;
46
+ margin: 0 auto;
47
  }
48
  th, td {
49
  border: 1px solid #ddd;
50
  text-align: center;
51
+ padding: 0;
52
  vertical-align: middle;
53
+ width: 256px;
54
  }
 
 
 
 
 
 
 
55
  img {
56
  width: 384px;
57
  height: 240px;
58
  object-fit: cover;
59
+ margin: 0 !important;
60
+ padding: 0 !important;
61
+ display: block;
62
  }
 
63
  .i2v_training_pipeline {
64
+ width: 100%;
65
+ max-width: 1200px;
66
+ height: auto;
67
+ object-fit: contain;
68
+ margin: 0 auto;
69
  }
70
  </style>
71
 
72