CPU / VLLM / LLAMA and GGUF in a boat

#6
by AchrafB67 - opened

It seems GGUF is not compatible with CPU only on VLLM...
Furthermore, llama.cpp can run GGUF model but wil not provide comptability with multimodal with this model...

Any suggestion for using the multi modal model quantized ? (I run on old CPU only..)

Sign up or log in to comment