--- license: cc-by-nc-sa-4.0 language: - en pipeline_tag: text-to-video tags: - text-to-video - video-generation - self-forcing - gguf base_model: - gdhe17/Self-Forcing --- # Self-Forcing2.1-T2V-1.3B-GGUF

📄 Self-Forcing    |    🧬 Wan2.1    |    🤖 GGUF
--- Developed by Nichonauta. This repository contains the quantized versions in **GGUF** format of the **Self-Forcing** video generation model. The Self-Forcing model is an evolution of `Wan2.1-T2V-1.3B`, optimized with an innovative "self-forcing" technique that allows it to correct its own generation errors in real-time. This results in more coherent and higher-quality videos. These GGUF files allow the model to be run efficiently on **GPU/CPU**, drastically reducing VRAM consumption and making video generation accessible without the need for high-end GPUs. ## ✨ Key Features - ⚡️ **GPU/CPU Inference:** Thanks to the GGUF format, the model can run on a wide range of hardware with optimized performance. - 🧠 **Self-Forcing Technique:** The model learns from its own predictions during generation to improve temporal consistency and visual quality of the video. - 🖼️ **Image-guided Generation:** Ability to generate smooth video transitions between a start and an end image, guided by a text prompt. - 📉 **Low Memory Consumption:** Quantization significantly reduces the RAM/VRAM memory footprint compared to the original models (`FP16`/`FP32`). - 🧬 **Based on a Solid Architecture:** It inherits the powerful base of the `Wan2.1-T2V-1.3B` model, known for its efficiency and quality. ## Usage The model files can be used in [ComfyUI](https://github.com/comfyanonymous/ComfyUI/) with the [ComfyUI-GGUF](https://github.com/city96/ComfyUI-GGUF) custom node. --- ## 🧐 What is GGUF? GGUF is a file format designed to store large language models (and other architectures) for fast inference on CPUs. The key advantages are: - **Fast Loading:** Does not require complex deserialization. - **Quantization:** Allows model weights to be stored with reduced precision (e.g., 4 or 8 bits instead of 16 or 32), which reduces file size and RAM usage. - **GPU/CPU Execution:** It is optimized to run on general-purpose processors through libraries like `llama.cpp`. **Note:** Running this video model in GGUF format requires compatible software that can interpret the video diffusion transformer architecture. --- ## 📚 Model Details and Attribution This work would not be possible without the open-source projects that precede it. ### Base Model: Wan2.1 This model is based on `Wan2.1-T2V-1.3B`, a powerful 1.3 billion parameter text-to-video model. It uses a Diffusion Transformer (DiT) architecture and a 3D VAE (Wan-VAE) optimized to preserve temporal information, making it ideal for video generation. - **Original Repository:** [Wan-AI/Wan2.1-T2V-1.3B](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B) - **Architecture:** Diffusion Transformer (DiT) with a T5 text encoder. ### Optimization Technique: Self-Forcing The `Wan2.1` model was enhanced with the **Self-Forcing** method, which trains the model to recognize and correct its own diffusion errors in a single forward pass. This improves fidelity and coherence without the need for costly additional training. - **Project Page:** [self-forcing.github.io](https://self-forcing.github.io/) ----- ## 🙏 Acknowledgements We thank the teams behind [Wan2.1](https://huggingface.co/Wan-AI/), [Self-Forcing](https://self-forcing.github.io/), [Stable Diffusion](https://huggingface.co/stabilityai/stable-diffusion-3-medium), [diffusers](https://github.com/huggingface/diffusers), and the entire [Hugging Face](https://huggingface.co) community for their contribution to the open-source ecosystem. ## ✍️ Citation If you find our work useful, please cite the original projects: ```bibtex @article{wan2.1, title = {Wan: Open and Advanced Large-Scale Video Generative Models}, author = {Wan Team}, journal = {}, year = {2025} } @misc{bar2024self, title={Self-Forcing for Real-Time Video Generation}, author={Tal Bar and Roy Vovers and Yael Vinker and Eliahu Horwitz and Mark B. Zkharya and Yedid Hoshen}, year={2024}, eprint={2405.03358}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```