amd
/

AMD-Hummingbird-I2V

Model card Files Files and versions

xet

Community

hecui102 commited on Jul 28

Commit

9c2636b

verified ·

1 Parent(s): 14bd6e2

Update README.md

Browse files

Files changed (1) hide show

README.md +42 -1

README.md CHANGED Viewed

@@ -8,13 +8,54 @@ datasets:
 computational budgets.Hummingbird-I2V adopts a lightweight **U-Net** architecture with **0.9B parameters** and a novel two-stage training strategy guided by
 **reward-based feedback**, resulting in substantial improvements in inference speed, model efficiency, and visual quality. To further improve output resolution with minimal
 overhead, we introduce a **super-resolution** module at the end of the pipeline. Additionally, we leverage **ReNeg**, an AMD proposed reward-guided framework for learning
-negative embeddings via gradient descent, to further boost visual quality. As a result, Hummingbird-I2V can generate high-quality 4K video in just 11 seconds with 16
 inference steps on an AMD Radeon™ RX 7900 XTX GPU.  Quantitative results on the VBench-I2V benchmark show that Hummingbird-I2V achieves state-of-the-art performance among
 U-Net-based diffusion models and competitive results compared to significantly larger DiT-based models. We provide a detailed analysis of the model architecture, training
 methodology, and benchmark performance.
 <img src="src/key_takeway.png" alt="key_takeway" title="key_takeway" class="key_takeway">
 | Model               | I2V Subj | I2V Bkg | Cam Mot | Subj Cons | Bkg Cons | Mot Smo | Dyn Deg | Aes Qual | Img Qual | Total Score |
 |---------------------|----------|---------|---------|-----------|-----------|----------|----------|-----------|-----------|--------------|
 | CogVideoSFT         | 97.67%   | 98.76%  | 84.93%  | 95.47%    | 98.30%    | 98.35%   | 36.51%   | 59.76%    | 67.64%    | 87.98%       |

 computational budgets.Hummingbird-I2V adopts a lightweight **U-Net** architecture with **0.9B parameters** and a novel two-stage training strategy guided by
 **reward-based feedback**, resulting in substantial improvements in inference speed, model efficiency, and visual quality. To further improve output resolution with minimal
 overhead, we introduce a **super-resolution** module at the end of the pipeline. Additionally, we leverage **ReNeg**, an AMD proposed reward-guided framework for learning
+negative embeddings via gradient descent, to further boost visual quality. As a result, Hummingbird-I2V can generate high-quality 4K video in just **11 seconds** with 16
 inference steps on an AMD Radeon™ RX 7900 XTX GPU.  Quantitative results on the VBench-I2V benchmark show that Hummingbird-I2V achieves state-of-the-art performance among
 U-Net-based diffusion models and competitive results compared to significantly larger DiT-based models. We provide a detailed analysis of the model architecture, training
 methodology, and benchmark performance.
 <img src="src/key_takeway.png" alt="key_takeway" title="key_takeway" class="key_takeway">
+<img src="src/i2v_training_pipeline.png" alt="i2v_training_pipeline" title="i2v_training_pipeline" class="i2v_training_pipeline">
+<style>
+  table {
+    width: auto;
+    border-collapse: collapse;
+  }
+  th, td {
+    border: 1px solid #ddd;
+    text-align: center;
+    padding: 0px;
+    vertical-align: middle;
+    width: 256px; /* 每列宽度固定 */
+  }
+  tr.text-row {
+    height: 30px; /* 文字行高度 */
+  }
+  tr.image-row {
+    height: 160px; /* 图片行高度 */
+  }
+  /* 默认表格中的图片大小 */
+  img {
+    width: 256px;
+    height: 160px;
+    object-fit: cover;
+  }
+  /* 只影响 vbench.png */
+  .vbench-img {
+    width: 785px !important;
+    height: 698px !important;
+    object-fit: contain; /* 让图片完整显示，不裁剪 */
+  }
+</style>
 | Model               | I2V Subj | I2V Bkg | Cam Mot | Subj Cons | Bkg Cons | Mot Smo | Dyn Deg | Aes Qual | Img Qual | Total Score |
 |---------------------|----------|---------|---------|-----------|-----------|----------|----------|-----------|-----------|--------------|
 | CogVideoSFT         | 97.67%   | 98.76%  | 84.93%  | 95.47%    | 98.30%    | 98.35%   | 36.51%   | 59.76%    | 67.64%    | 87.98%       |