amd
/

AMD-Hummingbird-I2V

Model card Files Files and versions

xet

Community

hecui102 commited on Jul 28

Commit

f9ad64d

verified ·

1 Parent(s): 3be4669

Update README.md

Browse files

Files changed (1) hide show

README.md +21 -15

README.md CHANGED Viewed

@@ -4,6 +4,7 @@ datasets:
 - nkp37/OpenVid-1M
 - TempoFunk/webvid-10M
 ---
 ⚡️ In this work, we present **AMD Hummingbird-I2V**, a compact and efficient **diffusion-based** I2V model designed for high-quality video synthesis under limited
 computational budgets.Hummingbird-I2V adopts a lightweight **U-Net** architecture with **0.9B parameters** and a novel two-stage training strategy guided by
 **reward-based feedback**, resulting in substantial improvements in inference speed, model efficiency, and visual quality. To further improve output resolution with minimal
@@ -13,7 +14,6 @@ inference steps on an AMD Radeon™ RX 7900 XTX GPU.  Quantitative results on th
 U-Net-based diffusion models and competitive results compared to significantly larger DiT-based models. We provide a detailed analysis of the model architecture, training
 methodology, and benchmark performance.
-<img src="src/key_takeway.png" alt="key_takeway" title="key_takeway" class="key_takeway">
 <img src="src/i2v_training_pipeline.png" alt="i2v_training_pipeline" title="i2v_training_pipeline" class="i2v_training_pipeline">
@@ -50,22 +50,28 @@ methodology, and benchmark performance.
 </style>
-| Model               | I2V Subj | I2V Bkg | Cam Mot | Subj Cons | Bkg Cons | Mot Smo | Dyn Deg | Aes Qual | Img Qual | Total Score |
-|---------------------|----------|---------|---------|-----------|-----------|----------|----------|-----------|-----------|--------------|
-| CogVideoSFT         | 97.67%   | 98.76%  | 84.93%  | 95.47%    | 98.30%    | 98.35%   | 36.51%   | 59.76%    | 67.64%    | 87.98%       |
-| CogVideoX-12V-5B    | 98.87%   | 99.08%  | 76.25%  | 96.99%    | 99.02%    | 98.85%   | 21.79%   | 60.76%    | 69.53%    | 88.21%       |
-| Step-Video-T12V     | 97.44%   | 98.45%  | 48.15%  | 95.62%    | 96.92%    | 99.08%   | 48.78%   | 61.74%    | 70.17%    | 87.98%       |
-| HunYuan             | -        | -       | -       | -         | 93.85%    | 99.39%   | -        | -         | -         | -            |
-| Wan-2.1-14B         | -        | -       | -       | -         | 98.46%    | 96.07%   | -        | -         | -         | -            |
-| Animate-Anything    | 98.76%   | 98.58%  | 13.08%  | 98.90%    | 98.19%    | 98.61%   | 2.68%    | 67.12%    | 72.09%    | 86.48%       |
-| SEINE-512           | 97.15%   | 96.94%  | 20.97%  | 95.28%    | 97.12%    | 97.12%   | 27.07%   | 64.55%    | 71.39%    | 85.52%       |
-| I2VGen-XL           | 96.48%   | 96.83%  | 18.46%  | 95.45%    | 96.42%    | 98.03%   | 24.08%   | 64.82%    | 69.14%    | 85.28%       |
-| ConsistI2V          | 95.82%   | 95.95%  | 33.92%  | 95.27%    | 94.38%    | 97.38%   | 18.62%   | 59.00%    | 66.92%    | 84.91%       |
-| DynamiCrafter-512   | 97.05%   | 97.56%  | 20.92%  | 94.74%    | 98.29%    | 97.83%   | 40.57%   | 58.71%    | 62.28%    | 85.25%       |
-| Hummingbird-I2V     | 96.30%   | 96.39%  | 12.69%  | 97.10%    | 98.60%    | 98.24%   | 62.60%   | 64.45%    | 69.27%    | 87.05%       |

 - nkp37/OpenVid-1M
 - TempoFunk/webvid-10M
 ---
+# AMD Hummingbird image-to-video Model
 ⚡️ In this work, we present **AMD Hummingbird-I2V**, a compact and efficient **diffusion-based** I2V model designed for high-quality video synthesis under limited
 computational budgets.Hummingbird-I2V adopts a lightweight **U-Net** architecture with **0.9B parameters** and a novel two-stage training strategy guided by
 **reward-based feedback**, resulting in substantial improvements in inference speed, model efficiency, and visual quality. To further improve output resolution with minimal
 U-Net-based diffusion models and competitive results compared to significantly larger DiT-based models. We provide a detailed analysis of the model architecture, training
 methodology, and benchmark performance.
 <img src="src/i2v_training_pipeline.png" alt="i2v_training_pipeline" title="i2v_training_pipeline" class="i2v_training_pipeline">
 </style>
+<table>
+  <tr>
+    <td><img src="src/01.gif"></td>
+    <td><img src="src/02.gif"></td>
+    <td><img src="src/03.gif"></td>
+    <td><img src="src/04.gif"></td>
+  </tr>
+  <tr>
+    <td><img src="src/05.gif"></td>
+    <td><img src="src/06.gif"></td>
+    <td><img src="src/07.gif"></td>
+    <td><img src="src/08.gif"></td>
+  </tr>
+  <tr>
+    <td><img src="src/09.gif"></td>
+    <td><img src="src/10.gif"></td>
+    <td><img src="src/11.gif"></td>
+    <td><img src="src/12.gif"></td>
+  </tr>
+</table>