GGUF support
Hello, model looks very promising!
I want to try it locally via llama.cpp/ollama, will the model be available in GGUF format?
Thank you.
Always the same bulls*** .... nerds get top priority, but the average person who uses GGUF comes second... sigh
I pushed a safetensors fp8 you can run on 3090 for now.
Working on llamacpp today. Which is required to even get a gguf. Nemotron-h is a new hybrid architecture.
It’s not some trivial thing. It’s a 57 layer hybrid state space model interwoven with transformer MLP layers.
Thank you for your interest and your support!
There's an ongoing discussion & work for Nemotron-H support for GGUF/Llama.cpp. Please join the discussion & effort. Thank you!
https://github.com/ggml-org/llama.cpp/issues/15409
I have it working up to text gen.
Everything else is done up to token generation.
I push the code up sometime today.