Text Generation
Transformers
Safetensors
PyTorch
nvidia
conversational

GGUF support

#4
by RedEyed - opened

Hello, model looks very promising!
I want to try it locally via llama.cpp/ollama, will the model be available in GGUF format?

Thank you.

Always the same bulls*** .... nerds get top priority, but the average person who uses GGUF comes second... sigh

I pushed a safetensors fp8 you can run on 3090 for now.

Working on llamacpp today. Which is required to even get a gguf. Nemotron-h is a new hybrid architecture.

It’s not some trivial thing. It’s a 57 layer hybrid state space model interwoven with transformer MLP layers.

NVIDIA org

Thank you for your interest and your support!

There's an ongoing discussion & work for Nemotron-H support for GGUF/Llama.cpp. Please join the discussion & effort. Thank you!
https://github.com/ggml-org/llama.cpp/issues/15409

I have it working up to text gen.

Everything else is done up to token generation.

I push the code up sometime today.

Sign up or log in to comment