Wan14BT2VFusioniX / README.md
vrgamedevgirl84's picture
Update README.md
a4fb9f8 verified
|
raw
history blame
9.17 kB
metadata
tags:
  - text-to-video
  - diffusion
  - merged-model
  - video-generation
  - wan2.1
widget:
  - text: >-
      Prompt: A gritty close-up of an elven princess kneeling in a rocky ravine,
      calming a wounded, desert dragon. Its scales are cracked, dry, She wears a
      crimson sash over bone-colored armor, her auburn hair half-tied back. The
      camera dollies in rapidly as she reaches for its eye ridge. Lighting comes
      from golden sunlight reflecting off surrounding rock, casting a warm,
      earthy hue with no artificial glow.
    output:
      url: videos/Video_00063.mp4
  - text: >-
      Prompt: Tight close-up of her smiling lips and sparkling eyes, catching
      golden hour sunlight. She wears a white sundress with floral prints and a
      wide-brimmed straw hat. Camera pulls back in a dolly motion, revealing her
      twirling under a cherry blossom tree. Petals flutter in the air, casting
      playful shadows. Soft lens flares enhance the euphoric, dreamlike vibe.
      (Before vs After — Left: Wan2.1 | Right: Merged model
      Wan14BT2V_MasterModel)
    output:
      url: videos/AnimateDiff_00001.mp4
  - text: >-
      Prompt: A gritty close-up of a dwarven beastmaster’s face, his grey beard
      braided tightly, brows furrowed as he looks just off-camera. The camera
      dollies out over his shoulder, revealing a perched gryphon watching him
      from a boulder, its feathers rustling slightly in the breeze. The moment
      holds stillness and mutual trust. Lighting is early daylight, clean and
      sharp with strong environmental clarity.
    output:
      url: videos/FusionX_00012.mp4
  - text: >-
      Prompt: A gritty close-up of a jungle tracker crouching low, face flushed
      with focus as she watches a perched macaw a few feet ahead. Her cheek
      twitches as she shifts forward, beads of sweat visible on her brow. The
      camera slowly dollies in from below her line of sight, capturing the
      moment her eyes widen in fascination. Lighting is rich and directional
      from above, creating a warm glow over her face with minimal shadows.
    output:
      url: videos/FusionX_00005.mp4
  - text: >-
      Prompt: A gritty close-up of a battle-worn ranger kneeling in a scorched
      clearing, calming a wounded gryphon whose wing is torn and bloodied. Its
      feathers are dusky bronze with streaks of ash-gray. She wears soot-covered
      hunter green armor, her blonde hair pulled into a loose braid. The camera
      dollies in as her hand brushes the creature's sharp beak. Lighting comes
      from late afternoon sun filtering through smoke, casting a burnt-orange
      haze across the frame.
    output:
      url: videos/Video_00069.mp4
base_model:
  - Wan-AI/Wan2.1-T2V-14B
license: apache-2.0

🌀 Wan2.1_14B_FusionX

High-Performance Merged Text-to-Video Model
Built on WAN 2.1 and fused with research-grade components for cinematic motion, detail, and speed — optimized for ComfyUI and rapid iteration in as few as 6 steps.

Merged models for faster, richer motion & detail — high performance even at just 8 steps.

📌 Important: To match the quality shown here, use the linked workflows or make sure to follow the recommended settings outlined below.


🌀 Preview Gallery

These are compressed GIF previews for quick viewing — final video outputs are higher quality.

FusionX_00020
FusionX_00021
FusionX_00022
FusionX_00023
FusionX_00024
FusionX_00025
FusionX_00026
FusionX_00027
FusionX_00028
FusionX_00029
FusionX_00030
FusionX_00031


📂 Workflows & Model Downloads

🧠 GGUF Variants:


🎬 Example Videos

Want to see what FusionX can do? Check out these real outputs generated using the latest workflows and settings:


🚀 Overview

A powerful text-to-video model built on top of WAN 2.1 14B, merged with several research-grade models to boost:

  • Motion quality
  • Scene consistency
  • Visual detail

Comparable with closed-source solutions, but open and optimized for ComfyUI workflows.


💡 Inside the Fusion

This model includes the following merged components:

  • CausVid – Causal motion modeling for better flow and dynamics
  • AccVideo – Better temporal alignment and speed boost
  • MoviiGen1.1 – Cinematic smoothness and lighting
  • MPS Reward LoRA – Tuned for motion and detail
  • Custom LoRAs – For texture, clarity, and facial enhancements

All merged models use permissive open licenses (Apache 2.0 / MIT).


🔧 Usage Details

Text-to-Video

  • CGF: Must be set to 1
  • Shift:
    • 1024x576: Start at 1
    • 1080x720: Start at 2
    • For realism → lower values
    • For stylized → test 3–9
  • Scheduler:
    • Recommended: uni_pc
    • Alternative: flowmatch_causvid (better for some details)

Image-to-Video

  • CGF: 1
  • Shift: 2 works best in most cases
  • Scheduler:
    • Recommended: dmp++_sde/beta
  • To boost motion and reduce slow-mo effect:
    • Frame count: 121
    • FPS: 24

🛠 Technical Notes

  • Works in as few as 6 steps
  • Best quality at 8–10 steps
  • Drop-in replacement for Wan2.1-T2V-14B
  • Up to 50% faster rendering, especially with SageAttn
  • Works natively and with Kaji Wan Wrapper
    Wrapper GitHub
  • Do not re-add merged LoRAs (CausVid, AccVideo, MPS)
  • Feel free to add other LoRAs for style/variation
  • Native WAN workflows also supported (slightly slower)

🧪 Performance Tips

  • RTX 5090 → ~138 sec/video at 1024x576 / 81 frames
  • If VRAM is limited:
    • Enable block swapping
    • Start with 5 blocks and adjust as needed
  • Use SageAttn for ~30% speedup (wrapper only)
  • Do not use teacache
  • "Enhance a video" (tested): Adds vibrance (try values 2–4)
  • "SLG" not tested — feel free to explore

🧠 Prompt Help

Want better cinematic prompts? Try the WAN Cinematic Video Prompt Generator GPT — it adds visual richness and makes a big difference in quality. Download Here


📣 Join The Community

We’re building a friendly space to chat, share outputs, and get help.

  • Motion LoRAs coming soon
  • Tips, updates, and support from other users

👉 Join the Discord


⚖️ License

Merged under permissive licenses:

  • Apache 2.0 / MIT
  • You can use, modify, and redistribute
  • You must retain original license info
  • Outputs are not necessarily licensed — do your due diligence

This model is for research, education, and personal use only. Commercial use is your own responsibility. Please consult a legal advisor before monetizing outputs.


🙏 Credits

  • WAN Team (base model)
  • aejion (AccVideo)
  • Tianwei Yin (CausVid)
  • ZuluVision (MoviiGen)
  • Alibaba PAI (MPS LoRA)
  • Kijai (ComfyUI Wrapper)

And thanks to the open-source community!