--- license: cc-by-nc-sa-4.0 language: - en pipeline_tag: text-to-video tags: - text-to-video - video-generation - self-forcing - gguf base_model: - gdhe17/Self-Forcing --- # Self-Forcing2.1-T2V-1.3B-GGUF
📄 Self-Forcing    |    🧬 Wan2.1    |    🤖 GGUF
---
Developed by Nichonauta.
This repository contains the quantized versions in **GGUF** format of the **Self-Forcing** video generation model.
The Self-Forcing model is an evolution of `Wan2.1-T2V-1.3B`, optimized with an innovative "self-forcing" technique that allows it to correct its own generation errors in real-time. This results in more coherent and higher-quality videos.
These GGUF files allow the model to be run efficiently on **GPU/CPU**, drastically reducing VRAM consumption and making video generation accessible without the need for high-end GPUs.
## ✨ Key Features
- ⚡️ **GPU/CPU Inference:** Thanks to the GGUF format, the model can run on a wide range of hardware with optimized performance.
- 🧠 **Self-Forcing Technique:** The model learns from its own predictions during generation to improve temporal consistency and visual quality of the video.
- 🖼️ **Image-guided Generation:** Ability to generate smooth video transitions between a start and an end image, guided by a text prompt.
- 📉 **Low Memory Consumption:** Quantization significantly reduces the RAM/VRAM memory footprint compared to the original models (`FP16`/`FP32`).
- 🧬 **Based on a Solid Architecture:** It inherits the powerful base of the `Wan2.1-T2V-1.3B` model, known for its efficiency and quality.
## Usage
The model files can be used in [ComfyUI](https://github.com/comfyanonymous/ComfyUI/) with the [ComfyUI-GGUF](https://github.com/city96/ComfyUI-GGUF) custom node.
---
## 🧐 What is GGUF?
GGUF is a file format designed to store large language models (and other architectures) for fast inference on CPUs. The key advantages are:
- **Fast Loading:** Does not require complex deserialization.
- **Quantization:** Allows model weights to be stored with reduced precision (e.g., 4 or 8 bits instead of 16 or 32), which reduces file size and RAM usage.
- **GPU/CPU Execution:** It is optimized to run on general-purpose processors through libraries like `llama.cpp`.
**Note:** Running this video model in GGUF format requires compatible software that can interpret the video diffusion transformer architecture.
---
## 📚 Model Details and Attribution
This work would not be possible without the open-source projects that precede it.
### Base Model: Wan2.1
This model is based on `Wan2.1-T2V-1.3B`, a powerful 1.3 billion parameter text-to-video model. It uses a Diffusion Transformer (DiT) architecture and a 3D VAE (Wan-VAE) optimized to preserve temporal information, making it ideal for video generation.
- **Original Repository:** [Wan-AI/Wan2.1-T2V-1.3B](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B)
- **Architecture:** Diffusion Transformer (DiT) with a T5 text encoder.
### Optimization Technique: Self-Forcing
The `Wan2.1` model was enhanced with the **Self-Forcing** method, which trains the model to recognize and correct its own diffusion errors in a single forward pass. This improves fidelity and coherence without the need for costly additional training.
- **Project Page:** [self-forcing.github.io](https://self-forcing.github.io/)
-----
## 🙏 Acknowledgements
We thank the teams behind [Wan2.1](https://huggingface.co/Wan-AI/), [Self-Forcing](https://self-forcing.github.io/), [Stable Diffusion](https://huggingface.co/stabilityai/stable-diffusion-3-medium), [diffusers](https://github.com/huggingface/diffusers), and the entire [Hugging Face](https://huggingface.co) community for their contribution to the open-source ecosystem.
## ✍️ Citation
If you find our work useful, please cite the original projects:
```bibtex
@article{wan2.1,
title = {Wan: Open and Advanced Large-Scale Video Generative Models},
author = {Wan Team},
journal = {},
year = {2025}
}
@misc{bar2024self,
title={Self-Forcing for Real-Time Video Generation},
author={Tal Bar and Roy Vovers and Yael Vinker and Eliahu Horwitz and Mark B. Zkharya and Yedid Hoshen},
year={2024},
eprint={2405.03358},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```