|
--- |
|
base_model: |
|
- Wan-AI/Wan2.2-I2V-A14B |
|
base_model_relation: quantized |
|
pipeline_tag: image-to-video |
|
tags: |
|
- dfloat11 |
|
- df11 |
|
- lossless compression |
|
- 70% size, 100% accuracy |
|
--- |
|
|
|
# DFloat11 Compressed Model: `Wan-AI/Wan2.2-I2V-A14B` |
|
|
|
This is a **DFloat11 losslessly compressed** version of the original `Wan-AI/Wan2.2-I2V-A14B` model. It reduces model size by **32%** compared to the original BFloat16 model, while maintaining **bit-identical outputs** and supporting **efficient GPU inference**. |
|
|
|
🔥🔥🔥 Thanks to DFloat11 compression, `Wan-AI/Wan2.2-I2V-A14B` can now generate a 5-second 720P video on a single 24GB GPU, while maintaining full model quality. 🔥🔥🔥 |
|
|
|
### 📊 Performance Comparison |
|
|
|
| Model | Model Size | Peak GPU Memory (5-second 720P generation) | Generation Time (A100 GPU) | |
|
|----------------------------------------------------|--------------------|----------------------------------------------|----------------------------| |
|
| Wan-AI/Wan2.2-I2V-A14B (BFloat16) | ~56 GB | O.O.M. | - | |
|
| Wan-AI/Wan2.2-I2V-A14B (DFloat11) | 19.47 + 19.44 GB | 29.12 GB | 42 minutes | |
|
| Wan-AI/Wan2.2-I2V-A14B (DFloat11 + CPU Offloading) | 19.47 + 19.44 GB | 20.01 GB | 44 minutes | |
|
|
|
### 🔍 How It Works |
|
|
|
We apply Huffman coding to the exponent bits of BFloat16 model weights, which are highly compressible. We leverage hardware-aware algorithmic designs to enable highly efficient, on-the-fly weight decompression directly on the GPU. Find out more in our [research paper](https://arxiv.org/abs/2504.11651). |
|
|
|
### 🔧 How to Use |
|
|
|
1. Install or upgrade the DFloat11 pip package *(installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed)*: |
|
|
|
```bash |
|
pip install -U dfloat11[cuda12] |
|
``` |
|
|
|
2. Install the latest `diffusers` package from source: |
|
|
|
```bash |
|
pip install git+https://github.com/huggingface/diffusers |
|
``` |
|
|
|
3. Save the following code to a Python file `i2v.py`: |
|
|
|
```python |
|
import time |
|
import torch |
|
import numpy as np |
|
import argparse |
|
from diffusers import WanImageToVideoPipeline |
|
from diffusers.utils import export_to_video, load_image |
|
from dfloat11 import DFloat11Model |
|
|
|
parser = argparse.ArgumentParser(description='Image to Video generation using Wan2.2-I2V model') |
|
parser.add_argument('--cpu_offload', action='store_true', help='Enable CPU offloading') |
|
parser.add_argument('--image_path', type=str, default="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/wan_i2v_input.JPG", help='Path or URL to the input image') |
|
parser.add_argument('--width', type=int, default=1280, help='Output video width') |
|
parser.add_argument('--height', type=int, default=720, help='Output video height') |
|
parser.add_argument('--prompt', type=str, default="Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside.", help='Prompt for video generation') |
|
parser.add_argument('--negative_prompt', type=str, default="色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走", help='Negative prompt for video generation') |
|
parser.add_argument('--num_frames', type=int, default=81, help='Number of frames to generate') |
|
parser.add_argument('--guidance_scale', type=float, default=3.5, help='Guidance scale for generation') |
|
parser.add_argument('--num_inference_steps', type=int, default=40, help='Number of inference steps') |
|
parser.add_argument('--seed', type=int, default=42, help='Random seed for generation') |
|
parser.add_argument('--output', type=str, default='i2v_output.mp4', help='Output video path') |
|
parser.add_argument('--fps', type=int, default=16, help='FPS of output video') |
|
|
|
args = parser.parse_args() |
|
|
|
image = load_image(args.image_path) |
|
|
|
pipe = WanImageToVideoPipeline.from_pretrained("Wan-AI/Wan2.2-I2V-A14B-Diffusers", torch_dtype=torch.bfloat16) |
|
|
|
DFloat11Model.from_pretrained( |
|
"DFloat11/Wan2.2-I2V-A14B-DF11", |
|
device="cpu", |
|
cpu_offload=args.cpu_offload, |
|
bfloat16_model=pipe.transformer, |
|
) |
|
DFloat11Model.from_pretrained( |
|
"DFloat11/Wan2.2-I2V-A14B-2-DF11", |
|
device="cpu", |
|
cpu_offload=args.cpu_offload, |
|
bfloat16_model=pipe.transformer_2, |
|
) |
|
|
|
pipe.enable_model_cpu_offload() |
|
|
|
max_area = args.width * args.height |
|
aspect_ratio = image.height / image.width |
|
mod_value = pipe.vae_scale_factor_spatial * pipe.transformer.config.patch_size[1] |
|
height = round(np.sqrt(max_area * aspect_ratio)) // mod_value * mod_value |
|
width = round(np.sqrt(max_area / aspect_ratio)) // mod_value * mod_value |
|
image = image.resize((width, height)) |
|
|
|
generator = torch.Generator(device="cuda").manual_seed(args.seed) |
|
|
|
start_time = time.time() |
|
output = pipe( |
|
image=image, |
|
prompt=args.prompt, |
|
negative_prompt=args.negative_prompt, |
|
height=height, |
|
width=width, |
|
num_frames=args.num_frames, |
|
guidance_scale=args.guidance_scale, |
|
num_inference_steps=args.num_inference_steps, |
|
generator=generator, |
|
).frames[0] |
|
print(f"Time taken: {time.time() - start_time:.2f} seconds") |
|
|
|
export_to_video(output, args.output, fps=args.fps) |
|
|
|
max_memory = torch.cuda.max_memory_allocated() |
|
print(f"Max memory: {max_memory / (1000 ** 3):.2f} GB") |
|
``` |
|
|
|
4. To run without CPU offloading (40GB VRAM required): |
|
```bash |
|
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python i2v.py |
|
``` |
|
|
|
To run with CPU offloading (22.5GB VRAM required): |
|
```bash |
|
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python i2v.py --cpu_offload |
|
``` |
|
> Setting `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True` is strongly recommended to prevent out-of-memory errors caused by GPU memory fragmentation. |
|
|
|
### 📄 Learn More |
|
|
|
* **Paper**: [70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float](https://arxiv.org/abs/2504.11651) |
|
* **GitHub**: [https://github.com/LeanModels/DFloat11](https://github.com/LeanModels/DFloat11) |
|
* **HuggingFace**: [https://huggingface.co/DFloat11](https://huggingface.co/DFloat11) |
|
|