Any result examples? [V1 examples inside, outdated]

#1
by infoperobueno - opened

Please can you provide same examples... Thank you.

I can't say it's better than 2.1, it's slightly different

Note that the 2.1 model used is "Wan2_1-T2V-14B_fp8_e4m3fn" from the Kijai repo, which is 13.8go only

Thank you for the comparisons! I'm curious how more dynamic scenes would look, as there isn't much happening in these videos other than slow pans. I do notice that the cigarette smoker has much better motion in the 2.2 merge as one point of improvement.

Yes, I will post more dynamic comparisons, (consecutive seeds so not cherry picked of course), It's unfortunate that the metadata gets removed when posted on HF. I would prefer it to be available for transparency

Lots of examples here and they do seem mostly similar. The first video in your new set seems to mess up her turn in my WAN 2.2 merge. However, in the scene with the pair walking up to the storefront glass, my WAN 2.2 merge gets a neat reflection where WAN 2.1 doesn't. I'm wondering how it may be possible to pull in more WAN 2.2 without breaking too much LORA compatibility (or the ability to be 1 model)... I think a Lightx2v / PUSA LORA designed for WAN 2.2 is what is really needed.

I don't believe PUSA is a necessary method for WAN 2.2. 1/This model is trained on synthetic data, 2/ stacking multiple LoRAs leads to diminishing returns, even when combining just two models. Users are likely to add their own LoRAs to the stack, further complicating the process.

While combining both models sounds appealing, the WAN team likely would have done so already if it offered clear advantages. The decision to split the model probably stems from observed benefits, because the needs of two separate samplers is impractical and likely something the team would avoid unless it provided significant improvements.

The problem with 2.2 is that blocks swapping, lora merging, and torchcompile needs to be computed twice, this is what's make 2.2 that slow.

Splitting the model into two definitely has its benefits, but complicates many use cases, particularly in the search of speed. I believe that is why they also released the 5B model to try and get something faster because their pair of 14B models is slower. However, the 5B (from what I've seen and heard) is a bit of a disappointment and perhaps not ready for prime time yet. So, this merge is designed to bring some improvements from WAN 2.2 while still prioritizing speed and efficiency available now with WAN 2.1. I believe it hits that mark and am looking forward to further improvements anywhere they can be found.

I just made this one, it seemed sped up (surprisingly) so I jumped on the occasion to interpolate 2x and get a twice as long video lol.

800x800 181 frames image to video.

It used most of my 32GB RAM and 16GB VRAM.

@PGCRYPT : care to run some comparisons again with my experimental v2 frankenmerge?

https://huggingface.co/Phr00t/WAN2.2-14B-Rapid-AllInOne/tree/main/v2

Phr00t changed discussion title from Any result examples? to Any result examples? [V1 examples inside, outdated]

Sign up or log in to comment