amd
/

AMD-Hummingbird-I2V

Model card Files Files and versions

xet

Community

hecui102 commited on Jul 28

Commit

7f06aca

verified ·

1 Parent(s): cedb93a

Update README.md

Browse files

Files changed (1) hide show

README.md +42 -43

README.md CHANGED Viewed

@@ -5,69 +5,68 @@ datasets:
 - TempoFunk/webvid-10M
 ---
 # AMD Hummingbird image-to-video Model
-⚡️ In this work, we present **AMD Hummingbird-I2V**, a compact and efficient **diffusion-based** I2V model designed for high-quality video synthesis under limited
-computational budgets.Hummingbird-I2V adopts a lightweight **U-Net** architecture with **0.9B parameters** and a novel two-stage training strategy guided by
-**reward-based feedback**, resulting in substantial improvements in inference speed, model efficiency, and visual quality. To further improve output resolution with minimal
-overhead, we introduce a **super-resolution** module at the end of the pipeline. Additionally, we leverage **ReNeg**, an AMD proposed reward-guided framework for learning
-negative embeddings via gradient descent, to further boost visual quality. As a result, Hummingbird-I2V can generate high-quality 4K video in just **11 seconds** with 16
-inference steps on an AMD Radeon™ RX 7900 XTX GPU.  Quantitative results on the VBench-I2V benchmark show that Hummingbird-I2V achieves state-of-the-art performance among
-U-Net-based diffusion models and competitive results compared to significantly larger DiT-based models. We provide a detailed analysis of the model architecture, training
-methodology, and benchmark performance.
-<img src="src/i2v_training_pipeline.png" alt="i2v_training_pipeline" title="i2v_training_pipeline" class="i2v_training_pipeline">
-<table>
-  <tr>
-    <td><img src="src/01.gif"></td>
-    <td><img src="src/02.gif"></td>
-    <td><img src="src/03.gif"></td>
-    <td><img src="src/04.gif"></td>
-  </tr>
-  <tr>
-    <td><img src="src/05.gif"></td>
-    <td><img src="src/06.gif"></td>
-    <td><img src="src/07.gif"></td>
-    <td><img src="src/08.gif"></td>
-  </tr>
-  <tr>
-    <td><img src="src/09.gif"></td>
-    <td><img src="src/10.gif"></td>
-    <td><img src="src/11.gif"></td>
-    <td><img src="src/12.gif"></td>
-  </tr>
-</table>
 <style>
   table {
     width: auto;
     border-collapse: collapse;
   }
   th, td {
     border: 1px solid #ddd;
     text-align: center;
-    padding: 0px;
     vertical-align: middle;
-    width: 256px; /* 每列宽度固定 */
   }
-  tr.text-row {
-    height: 30px; /* 文字行高度 */
-  }
-  tr.image-row {
-    height: 160px; /* 图片行高度 */
-  }
-  /* 默认表格中的图片大小 */
   img {
     width: 384px;
     height: 240px;
     object-fit: cover;
   }
-  /* 只影响 vbench.png */
   .i2v_training_pipeline {
-    width: 1200px !important;
-    height: 900px !important;
-    object-fit: contain; /* 让图片完整显示，不裁剪 */
   }
 </style>

 - TempoFunk/webvid-10M
 ---
 # AMD Hummingbird image-to-video Model
+⚡️ In this work, we present **AMD Hummingbird-I2V**, a compact and efficient **diffusion-based** I2V model designed for high-quality video synthesis under limited computational budgets.
+Hummingbird-I2V adopts a lightweight **U-Net** architecture with **0.9B parameters** and a novel two-stage training strategy guided by **reward-based feedback**, resulting in substantial improvements in inference speed, model efficiency, and visual quality.
+To further improve output resolution with minimal overhead, we introduce a **super-resolution** module at the end of the pipeline. Additionally, we leverage **ReNeg**, an AMD proposed reward-guided framework for learning negative embeddings via gradient descent, to further boost visual quality.
+As a result, Hummingbird-I2V can generate high-quality 4K video in just **11 seconds** with 16 inference steps on an AMD Radeon™ RX 7900 XTX GPU.
+Quantitative results on the VBench-I2V benchmark show that Hummingbird-I2V achieves state-of-the-art performance among U-Net-based diffusion models and competitive results compared to significantly larger DiT-based models.
+We provide a detailed analysis of the model architecture, training methodology, and benchmark performance.
+<div style="margin: 0; padding: 0;">
+  <img src="src/i2v_training_pipeline.png" alt="i2v_training_pipeline" title="i2v_training_pipeline" class="i2v_training_pipeline">
+</div>
+<div style="margin: 0; padding: 0;">
+  <table>
+    <tr>
+      <td><img src="src/01.gif"></td>
+      <td><img src="src/02.gif"></td>
+      <td><img src="src/03.gif"></td>
+      <td><img src="src/04.gif"></td>
+    </tr>
+    <tr>
+      <td><img src="src/05.gif"></td>
+      <td><img src="src/06.gif"></td>
+      <td><img src="src/07.gif"></td>
+      <td><img src="src/08.gif"></td>
+    </tr>
+    <tr>
+      <td><img src="src/09.gif"></td>
+      <td><img src="src/10.gif"></td>
+      <td><img src="src/11.gif"></td>
+      <td><img src="src/12.gif"></td>
+    </tr>
+  </table>
+</div>
 <style>
   table {
     width: auto;
     border-collapse: collapse;
+    margin: 0 auto;
   }
   th, td {
     border: 1px solid #ddd;
     text-align: center;
+    padding: 0;
     vertical-align: middle;
+    width: 256px;
   }
   img {
     width: 384px;
     height: 240px;
     object-fit: cover;
+    margin: 0 !important;
+    padding: 0 !important;
+    display: block;
   }
   .i2v_training_pipeline {
+    width: 100%;
+    max-width: 1200px;
+    height: auto;
+    object-fit: contain;
+    margin: 0 auto;
   }
 </style>