amd
/

AMD-Hummingbird-I2V

Model card Files Files and versions

xet

Community

hecui102 commited on Jul 28

Commit

14bd6e2

verified ·

1 Parent(s): 4c3d339

Update README.md

Browse files

Files changed (1) hide show

README.md +24 -8

README.md CHANGED Viewed

@@ -4,11 +4,27 @@ datasets:
 - nkp37/OpenVid-1M
 - TempoFunk/webvid-10M
 ---
-In this work, we present AMD Hummingbird-I2V, a compact and efficient diffusion-based I2V model designed for high-quality video synthesis under limited computational budgets.
-Hummingbird-I2V adopts a lightweight U-Net architecture with 0.9B parameters and a novel two-stage training strategy guided by reward-based feedback, resulting in
-substantial improvements in inference speed, model efficiency, and visual quality. To further improve output resolution with minimal overhead, we introduce a
-super-resolution module at the end of the pipeline. Additionally, we leverage ReNeg, an AMD proposed reward-guided framework for learning negative embeddings via
-gradient descent, to further boost visual quality. As a result, Hummingbird-I2V can generate high-quality 4K video in just 11 seconds with 16 inference steps on an AMD
-Radeon™ RX 7900 XTX GPU.  Quantitative results on the VBench-I2V benchmark show that Hummingbird-I2V achieves state-of-the-art performance among U-Net-based
-diffusion models and competitive results compared to significantly larger DiT-based models. We provide a detailed analysis of the model architecture, training methodology,
-and benchmark performance.

 - nkp37/OpenVid-1M
 - TempoFunk/webvid-10M
 ---
+⚡️ In this work, we present **AMD Hummingbird-I2V**, a compact and efficient **diffusion-based** I2V model designed for high-quality video synthesis under limited
+computational budgets.Hummingbird-I2V adopts a lightweight **U-Net** architecture with **0.9B parameters** and a novel two-stage training strategy guided by
+**reward-based feedback**, resulting in substantial improvements in inference speed, model efficiency, and visual quality. To further improve output resolution with minimal
+overhead, we introduce a **super-resolution** module at the end of the pipeline. Additionally, we leverage **ReNeg**, an AMD proposed reward-guided framework for learning
+negative embeddings via gradient descent, to further boost visual quality. As a result, Hummingbird-I2V can generate high-quality 4K video in just 11 seconds with 16
+inference steps on an AMD Radeon™ RX 7900 XTX GPU.  Quantitative results on the VBench-I2V benchmark show that Hummingbird-I2V achieves state-of-the-art performance among
+U-Net-based diffusion models and competitive results compared to significantly larger DiT-based models. We provide a detailed analysis of the model architecture, training
+methodology, and benchmark performance.
+<img src="src/key_takeway.png" alt="key_takeway" title="key_takeway" class="key_takeway">
+| Model               | I2V Subj | I2V Bkg | Cam Mot | Subj Cons | Bkg Cons | Mot Smo | Dyn Deg | Aes Qual | Img Qual | Total Score |
+|---------------------|----------|---------|---------|-----------|-----------|----------|----------|-----------|-----------|--------------|
+| CogVideoSFT         | 97.67%   | 98.76%  | 84.93%  | 95.47%    | 98.30%    | 98.35%   | 36.51%   | 59.76%    | 67.64%    | 87.98%       |
+| CogVideoX-12V-5B    | 98.87%   | 99.08%  | 76.25%  | 96.99%    | 99.02%    | 98.85%   | 21.79%   | 60.76%    | 69.53%    | 88.21%       |
+| Step-Video-T12V     | 97.44%   | 98.45%  | 48.15%  | 95.62%    | 96.92%    | 99.08%   | 48.78%   | 61.74%    | 70.17%    | 87.98%       |
+| HunYuan             | -        | -       | -       | -         | 93.85%    | 99.39%   | -        | -         | -         | -            |
+| Wan-2.1-14B         | -        | -       | -       | -         | 98.46%    | 96.07%   | -        | -         | -         | -            |
+| Animate-Anything    | 98.76%   | 98.58%  | 13.08%  | 98.90%    | 98.19%    | 98.61%   | 2.68%    | 67.12%    | 72.09%    | 86.48%       |
+| SEINE-512           | 97.15%   | 96.94%  | 20.97%  | 95.28%    | 97.12%    | 97.12%   | 27.07%   | 64.55%    | 71.39%    | 85.52%       |
+| I2VGen-XL           | 96.48%   | 96.83%  | 18.46%  | 95.45%    | 96.42%    | 98.03%   | 24.08%   | 64.82%    | 69.14%    | 85.28%       |
+| ConsistI2V          | 95.82%   | 95.95%  | 33.92%  | 95.27%    | 94.38%    | 97.38%   | 18.62%   | 59.00%    | 66.92%    | 84.91%       |
+| DynamiCrafter-512   | 97.05%   | 97.56%  | 20.92%  | 94.74%    | 98.29%    | 97.83%   | 40.57%   | 58.71%    | 62.28%    | 85.25%       |
+| Hummingbird-I2V     | 96.30%   | 96.39%  | 12.69%  | 97.10%    | 98.60%    | 98.24%   | 62.60%   | 64.45%    | 69.27%    | 87.05%       |