Mungert/SkyCaptioner-V1-GGUF · How to make gguf inference videos?

I did wonder how this might be done. I do not know how models encode video sequences so I am totally guessing an answer. If a video is a sequence of images then if you gave the model a sequence of images (before your text prompt) would it then infer that as a video. depends how it was trained and what special tokens it uses to indicate the difference between an image and video , if any. Worth a try as there is no video support from llama.cpp and vLlm is not supporting this model. You could also put in a feature https://github.com/ggml-org/llama.cpp/issues