amd
/

AMD-Hummingbird-I2V

Model card Files Files and versions

AMD-Hummingbird-I2V / README.md

hecui102's picture

Update README.md

9c2636b verified 3 months ago

|

3.99 kB

	---
	license: agpl-3.0
	datasets:
	- nkp37/OpenVid-1M
	- TempoFunk/webvid-10M
	---
	⚡️ In this work, we present AMD Hummingbird-I2V, a compact and efficient diffusion-based I2V model designed for high-quality video synthesis under limited
	computational budgets.Hummingbird-I2V adopts a lightweight U-Net architecture with 0.9B parameters and a novel two-stage training strategy guided by
	reward-based feedback, resulting in substantial improvements in inference speed, model efficiency, and visual quality. To further improve output resolution with minimal
	overhead, we introduce a super-resolution module at the end of the pipeline. Additionally, we leverage ReNeg, an AMD proposed reward-guided framework for learning
	negative embeddings via gradient descent, to further boost visual quality. As a result, Hummingbird-I2V can generate high-quality 4K video in just 11 seconds with 16
	inference steps on an AMD Radeon™ RX 7900 XTX GPU. Quantitative results on the VBench-I2V benchmark show that Hummingbird-I2V achieves state-of-the-art performance among
	U-Net-based diffusion models and competitive results compared to significantly larger DiT-based models. We provide a detailed analysis of the model architecture, training
	methodology, and benchmark performance.

	<img src="src/key_takeway.png" alt="key_takeway" title="key_takeway" class="key_takeway">

	<img src="src/i2v_training_pipeline.png" alt="i2v_training_pipeline" title="i2v_training_pipeline" class="i2v_training_pipeline">

	<style>
	table {
	width: auto;
	border-collapse: collapse;
	}
	th, td {
	border: 1px solid #ddd;
	text-align: center;
	padding: 0px;
	vertical-align: middle;
	width: 256px; /* 每列宽度固定 */
	}
	tr.text-row {
	height: 30px; /* 文字行高度 */
	}
	tr.image-row {
	height: 160px; /* 图片行高度 */
	}
	/* 默认表格中的图片大小 */
	img {
	width: 256px;
	height: 160px;
	object-fit: cover;
	}
	/* 只影响 vbench.png */
	.vbench-img {
	width: 785px !important;
	height: 698px !important;
	object-fit: contain; /* 让图片完整显示，不裁剪 */
	}
	</style>








	\| Model \| I2V Subj \| I2V Bkg \| Cam Mot \| Subj Cons \| Bkg Cons \| Mot Smo \| Dyn Deg \| Aes Qual \| Img Qual \| Total Score \|
	\|---------------------\|----------\|---------\|---------\|-----------\|-----------\|----------\|----------\|-----------\|-----------\|--------------\|
	\| CogVideoSFT \| 97.67% \| 98.76% \| 84.93% \| 95.47% \| 98.30% \| 98.35% \| 36.51% \| 59.76% \| 67.64% \| 87.98% \|
	\| CogVideoX-12V-5B \| 98.87% \| 99.08% \| 76.25% \| 96.99% \| 99.02% \| 98.85% \| 21.79% \| 60.76% \| 69.53% \| 88.21% \|
	\| Step-Video-T12V \| 97.44% \| 98.45% \| 48.15% \| 95.62% \| 96.92% \| 99.08% \| 48.78% \| 61.74% \| 70.17% \| 87.98% \|
	\| HunYuan \| - \| - \| - \| - \| 93.85% \| 99.39% \| - \| - \| - \| - \|
	\| Wan-2.1-14B \| - \| - \| - \| - \| 98.46% \| 96.07% \| - \| - \| - \| - \|
	\| Animate-Anything \| 98.76% \| 98.58% \| 13.08% \| 98.90% \| 98.19% \| 98.61% \| 2.68% \| 67.12% \| 72.09% \| 86.48% \|
	\| SEINE-512 \| 97.15% \| 96.94% \| 20.97% \| 95.28% \| 97.12% \| 97.12% \| 27.07% \| 64.55% \| 71.39% \| 85.52% \|
	\| I2VGen-XL \| 96.48% \| 96.83% \| 18.46% \| 95.45% \| 96.42% \| 98.03% \| 24.08% \| 64.82% \| 69.14% \| 85.28% \|
	\| ConsistI2V \| 95.82% \| 95.95% \| 33.92% \| 95.27% \| 94.38% \| 97.38% \| 18.62% \| 59.00% \| 66.92% \| 84.91% \|
	\| DynamiCrafter-512 \| 97.05% \| 97.56% \| 20.92% \| 94.74% \| 98.29% \| 97.83% \| 40.57% \| 58.71% \| 62.28% \| 85.25% \|
	\| Hummingbird-I2V \| 96.30% \| 96.39% \| 12.69% \| 97.10% \| 98.60% \| 98.24% \| 62.60% \| 64.45% \| 69.27% \| 87.05% \|