File size: 2,975 Bytes
e3a1aba
 
 
 
 
 
 
 
 
 
 
 
 
 
94ca3cd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
license: mit
language:
  - en
base_model:
  - Wan-AI/Wan2.1-I2V-14B-480P
  - Wan-AI/Wan2.1-I2V-14B-480P-Diffusers
pipeline_tag: image-to-video
library_name: diffusers
tags:
  - AIGC
  - LoRA
  - adapter
---

Please refer to our github for more info: https://github.com/alibaba/wan-toy-transform


<div align="center">
<h2><center>Wan Toy Transform</h2>
<br>
Alibaba Research Intelligence Computing
<br>
<a href="https://github.com/alibaba/wan-toy-transform"><img src='https://img.shields.io/badge/Github-Link-black'></a>
<a href='https://modelscope.cn/models/Alibaba_Research_Intelligence_Computing/wan-toy-transform'><img src='https://img.shields.io/badge/🤖_ModelScope-weights-%23654dfc'></a>
<a href='https://huggingface.co/Alibaba-Research-Intelligence-Computing/wan-toy-transform'><img src='https://img.shields.io/badge/🤗_HuggingFace-weights-%23ff9e0e'></a>
<br>
</div>

This is a LoRA model finetuned on [Wan-I2V-14B-480P](https://github.com/Wan-Video/Wan2.1). It turns things in the image into fluffy toys.

## 🐍 Installation

```bash
# Python 3.12 and PyTorch 2.6.0 are tested.
pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
```

## 🔄 Inference

```bash
python generate.py --prompt "The video opens with a clear view of a $name. Then it transforms to a b6e9636 JellyCat-style $name. It has a face and a cute, fluffy and playful appearance." --image $image_path --save_file "output.mp4" --offload_type leaf_level
```

Note:

- Change `$name` to the object name you want to transform.
- `$image_path` is the path to the first frame image.
- Choose `--offload_type` from ['leaf_level', 'block_level', 'none', 'model']. More details can be found [here](https://huggingface.co/docs/diffusers/optimization/memory#group-offloading).
- VRAM usage and generation time of different `--offload_type` are listed below.

  | `--offload_type`                     | VRAM Usage | Generation Time (NVIDIA A100) |
  | ------------------------------------ | ---------- | ----------------------------- |
  | leaf_level                           | 11.9 GB    | 17m17s                        |
  | block_level (num_blocks_per_group=1) | 20.5 GB    | 16m48s                        |
  | model                                | 39.4 GB    | 16m24s                        |
  | none                                 | 55.9 GB    | 16m08s                        |

## 🤝 Acknowledgements

Special thanks to these projects for their contributions to the community!

- [Wan2.1](https://github.com/Wan-Video/Wan2.1)
- [diffusion-pipe](https://github.com/tdrussell/diffusion-pipe)
- [diffusers](https://github.com/huggingface/diffusers)

## 📄 Our previous work

- [Tora: Trajectory-oriented Diffusion Transformer for Video Generation](https://github.com/alibaba/Tora)

- [AnimateAnything: Fine Grained Open Domain Image Animation with Motion Guidance](https://github.com/alibaba/animate-anything)