Add files using upload-large-folder tool

0e8271d verified 3 months ago

7.16 kB

	---
	base_model:
	- Wan-AI/Wan2.2-I2V-A14B
	base_model_relation: quantized
	pipeline_tag: image-to-video
	tags:
	- dfloat11
	- df11
	- lossless compression
	- 70% size, 100% accuracy
	---

	# DFloat11 Compressed Model: `Wan-AI/Wan2.2-I2V-A14B`

	This is a DFloat11 losslessly compressed version of the original `Wan-AI/Wan2.2-I2V-A14B` model. It reduces model size by 32% compared to the original BFloat16 model, while maintaining bit-identical outputs and supporting efficient GPU inference.

	🔥🔥🔥 Thanks to DFloat11 compression, `Wan-AI/Wan2.2-I2V-A14B` can now generate a 5-second 720P video on a single 24GB GPU, while maintaining full model quality. 🔥🔥🔥

	### 📊 Performance Comparison

	\| Model \| Model Size \| Peak GPU Memory (5-second 720P generation) \| Generation Time (A100 GPU) \|
	\|----------------------------------------------------\|--------------------\|----------------------------------------------\|----------------------------\|
	\| Wan-AI/Wan2.2-I2V-A14B (BFloat16) \| ~56 GB \| O.O.M. \| - \|
	\| Wan-AI/Wan2.2-I2V-A14B (DFloat11) \| 19.47 + 19.44 GB \| 29.12 GB \| 42 minutes \|
	\| Wan-AI/Wan2.2-I2V-A14B (DFloat11 + CPU Offloading) \| 19.47 + 19.44 GB \| 20.01 GB \| 44 minutes \|

	### 🔍 How It Works

	We apply Huffman coding to the exponent bits of BFloat16 model weights, which are highly compressible. We leverage hardware-aware algorithmic designs to enable highly efficient, on-the-fly weight decompression directly on the GPU. Find out more in our [research paper](https://arxiv.org/abs/2504.11651).

	### 🔧 How to Use

	1. Install or upgrade the DFloat11 pip package (installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed):

	```bash
	pip install -U dfloat11[cuda12]
	```

	2. Install the latest `diffusers` package from source:

	```bash
	pip install git+https://github.com/huggingface/diffusers
	```

	3. Save the following code to a Python file `i2v.py`:

	```python
	import time
	import torch
	import numpy as np
	import argparse
	from diffusers import WanImageToVideoPipeline
	from diffusers.utils import export_to_video, load_image
	from dfloat11 import DFloat11Model

	parser = argparse.ArgumentParser(description='Image to Video generation using Wan2.2-I2V model')
	parser.add_argument('--cpu_offload', action='store_true', help='Enable CPU offloading')
	parser.add_argument('--image_path', type=str, default="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/wan_i2v_input.JPG", help='Path or URL to the input image')
	parser.add_argument('--width', type=int, default=1280, help='Output video width')
	parser.add_argument('--height', type=int, default=720, help='Output video height')
	parser.add_argument('--prompt', type=str, default="Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside.", help='Prompt for video generation')
	parser.add_argument('--negative_prompt', type=str, default="色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走", help='Negative prompt for video generation')
	parser.add_argument('--num_frames', type=int, default=81, help='Number of frames to generate')
	parser.add_argument('--guidance_scale', type=float, default=3.5, help='Guidance scale for generation')
	parser.add_argument('--num_inference_steps', type=int, default=40, help='Number of inference steps')
	parser.add_argument('--seed', type=int, default=42, help='Random seed for generation')
	parser.add_argument('--output', type=str, default='i2v_output.mp4', help='Output video path')
	parser.add_argument('--fps', type=int, default=16, help='FPS of output video')

	args = parser.parse_args()

	image = load_image(args.image_path)

	pipe = WanImageToVideoPipeline.from_pretrained("Wan-AI/Wan2.2-I2V-A14B-Diffusers", torch_dtype=torch.bfloat16)

	DFloat11Model.from_pretrained(
	"DFloat11/Wan2.2-I2V-A14B-DF11",
	device="cpu",
	cpu_offload=args.cpu_offload,
	bfloat16_model=pipe.transformer,
	)
	DFloat11Model.from_pretrained(
	"DFloat11/Wan2.2-I2V-A14B-2-DF11",
	device="cpu",
	cpu_offload=args.cpu_offload,
	bfloat16_model=pipe.transformer_2,
	)

	pipe.enable_model_cpu_offload()

	max_area = args.width * args.height
	aspect_ratio = image.height / image.width
	mod_value = pipe.vae_scale_factor_spatial * pipe.transformer.config.patch_size[1]
	height = round(np.sqrt(max_area * aspect_ratio)) // mod_value * mod_value
	width = round(np.sqrt(max_area / aspect_ratio)) // mod_value * mod_value
	image = image.resize((width, height))

	generator = torch.Generator(device="cuda").manual_seed(args.seed)

	start_time = time.time()
	output = pipe(
	image=image,
	prompt=args.prompt,
	negative_prompt=args.negative_prompt,
	height=height,
	width=width,
	num_frames=args.num_frames,
	guidance_scale=args.guidance_scale,
	num_inference_steps=args.num_inference_steps,
	generator=generator,
	).frames[0]
	print(f"Time taken: {time.time() - start_time:.2f} seconds")

	export_to_video(output, args.output, fps=args.fps)

	max_memory = torch.cuda.max_memory_allocated()
	print(f"Max memory: {max_memory / (1000 ** 3):.2f} GB")
	```

	4. To run without CPU offloading (40GB VRAM required):
	```bash
	PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python i2v.py
	```

	To run with CPU offloading (22.5GB VRAM required):
	```bash
	PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python i2v.py --cpu_offload
	```
	> Setting `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True` is strongly recommended to prevent out-of-memory errors caused by GPU memory fragmentation.

	### 📄 Learn More

	* Paper: [70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float](https://arxiv.org/abs/2504.11651)
	* GitHub: [https://github.com/LeanModels/DFloat11](https://github.com/LeanModels/DFloat11)
	* HuggingFace: [https://huggingface.co/DFloat11](https://huggingface.co/DFloat11)