|
--- |
|
license: mit |
|
--- |
|
|
|
# Scaffold-and-Fill Diffusion (SF-Diff): A Hybrid Architecture for Accelerated Language Model Inference |
|
|
|
**Author:** Hilal Limo (Self-Taught Independent Researcher, Age 15) |
|
|
|
**[➡️ Click here to read the full paper: SF-Diff-HL.pdf](SF-Diff-HL.pdf)** |
|
|
|
--- |
|
|
|
## Abstract |
|
|
|
Autoregressive transformer models, the dominant architecture for modern Large Language Models (LLMs), are fundamentally constrained by high inference latency due to their sequential generation process. In this paper, I propose Scaffold-and-Fill Diffusion (SF-Diff), a novel hybrid architecture designed to significantly accelerate text generation by deconstructing the task into two parallelizable stages. The core hypothesis is that natural language can be separated into a semantic "scaffolding" of keywords and a grammatical "filler" of structural words. SF-Diff first utilizes a non-autoregressive diffusion model to generate the complete semantic scaffold, a sequence of keyword vector embeddings in a fixed number of highly parallelizable steps. Subsequently, a lightweight autoregressive transformer decoder performs a "grammatical infilling" task, weaving the structural words around the pre-generated semantic core. This approach aims to combine the holistic, parallel generation strengths of diffusion models with the grammatical precision of transformers, offering a substantial reduction in inference latency while maintaining high-quality, coherent output. |
|
|
|
--- |
|
|
|
## Citation |
|
|
|
If you find this work interesting, please consider citing the paper: |
|
|
|
```bibtex |
|
@misc{limo2025sfdiff, |
|
author = {Hilal Limo}, |
|
title = {Scaffold-and-Fill Diffusion (SF-Diff): A Hybrid Architecture for Accelerated Language Model Inference}, |
|
year = {2025}, |
|
publisher = {Hugging Face}, |
|
journal = {Hugging Face Hub}, |
|
howpublished = {\url{https://huggingface.co/TimesLast/SF-Diff}} |
|
} |