caching results can be used for diffusion applications as well, although not similar to KV caching since kv caching is optimized for next token prediction while in diffusion the full sequence already exists.
the short answer to your question is yes kv caching is only for next token prediction but there are other caching techniques out there for other tasks
try checking what PrunaAI has been doing to optimize diffusion models, here's a link to one of their previous presentations : LINKEDIN_POST

