---
license: cc-by-nc-sa-4.0
language:
- en
pipeline_tag: text-to-video
tags:
- text-to-video
- video-generation
- self-forcing
- gguf
base_model:
- gdhe17/Self-Forcing
---

# Self-Forcing2.1-T2V-1.3B-GGUF

<p align="center">
    📄 <a href="https://self-forcing.github.io/"><b>Self-Forcing</b></a> &nbsp&nbsp ｜ &nbsp&nbsp 🧬 <a href="https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B"><b>Wan2.1</b></a> &nbsp&nbsp  ｜ &nbsp&nbsp 🤖 <a href="https://huggingface.co/Nichonauta/Self-Forcing2.1-T2V-1.3B-GGUF"><b>GGUF</b></a>
<br>

---

Developed by <a href="https://www.youtube.com/@nichonauta">Nichonauta</a>.

This repository contains the quantized versions in **GGUF** format of the **Self-Forcing** video generation model.

The Self-Forcing model is an evolution of `Wan2.1-T2V-1.3B`, optimized with an innovative "self-forcing" technique that allows it to correct its own generation errors in real-time. This results in more coherent and higher-quality videos.

These GGUF files allow the model to be run efficiently on **GPU/CPU**, drastically reducing VRAM consumption and making video generation accessible without the need for high-end GPUs.

## ✨ Key Features

-   ⚡️ **GPU/CPU Inference:** Thanks to the GGUF format, the model can run on a wide range of hardware with optimized performance.
-   🧠 **Self-Forcing Technique:** The model learns from its own predictions during generation to improve temporal consistency and visual quality of the video.
-   🖼️ **Image-guided Generation:** Ability to generate smooth video transitions between a start and an end image, guided by a text prompt.
-   📉 **Low Memory Consumption:** Quantization significantly reduces the RAM/VRAM memory footprint compared to the original models (`FP16`/`FP32`).
-   🧬 **Based on a Solid Architecture:** It inherits the powerful base of the `Wan2.1-T2V-1.3B` model, known for its efficiency and quality.

## Usage

The model files can be used in [ComfyUI](https://github.com/comfyanonymous/ComfyUI/) with the [ComfyUI-GGUF](https://github.com/city96/ComfyUI-GGUF) custom node.

---

## 🧐 What is GGUF?

GGUF is a file format designed to store large language models (and other architectures) for fast inference on CPUs. The key advantages are:

-   **Fast Loading:** Does not require complex deserialization.
-   **Quantization:** Allows model weights to be stored with reduced precision (e.g., 4 or 8 bits instead of 16 or 32), which reduces file size and RAM usage.
-   **GPU/CPU Execution:** It is optimized to run on general-purpose processors through libraries like `llama.cpp`.

**Note:** Running this video model in GGUF format requires compatible software that can interpret the video diffusion transformer architecture.

---

## 📚 Model Details and Attribution

This work would not be possible without the open-source projects that precede it.

### Base Model: Wan2.1

This model is based on `Wan2.1-T2V-1.3B`, a powerful 1.3 billion parameter text-to-video model. It uses a Diffusion Transformer (DiT) architecture and a 3D VAE (Wan-VAE) optimized to preserve temporal information, making it ideal for video generation.

  - **Original Repository:** [Wan-AI/Wan2.1-T2V-1.3B](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B)
  - **Architecture:** Diffusion Transformer (DiT) with a T5 text encoder.

### Optimization Technique: Self-Forcing

The `Wan2.1` model was enhanced with the **Self-Forcing** method, which trains the model to recognize and correct its own diffusion errors in a single forward pass. This improves fidelity and coherence without the need for costly additional training.

  - **Project Page:** [self-forcing.github.io](https://self-forcing.github.io/)

-----

## 🙏 Acknowledgements

We thank the teams behind [Wan2.1](https://huggingface.co/Wan-AI/), [Self-Forcing](https://self-forcing.github.io/), [Stable Diffusion](https://huggingface.co/stabilityai/stable-diffusion-3-medium), [diffusers](https://github.com/huggingface/diffusers), and the entire [Hugging Face](https://huggingface.co) community for their contribution to the open-source ecosystem.

## ✍️ Citation

If you find our work useful, please cite the original projects:

```bibtex
@article{wan2.1,
    title   = {Wan: Open and Advanced Large-Scale Video Generative Models},
    author  = {Wan Team},
    journal = {},
    year    = {2025}
}

@misc{bar2024self,
      title={Self-Forcing for Real-Time Video Generation},
      author={Tal Bar and Roy Vovers and Yael Vinker and Eliahu Horwitz and Mark B. Zkharya and Yedid Hoshen},
      year={2024},
      eprint={2405.03358},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```