GGUF support

by RedEyed - opened 6 days ago

6 days ago

Hello, model looks very promising!
I want to try it locally via llama.cpp/ollama, will the model be available in GGUF format?

Thank you.

5 days ago

Always the same bulls*** .... nerds get top priority, but the average person who uses GGUF comes second... sigh

3 days ago

I pushed a safetensors fp8 you can run on 3090 for now.

Working on llamacpp today. Which is required to even get a gguf. Nemotron-h is a new hybrid architecture.

It’s not some trivial thing. It’s a 57 layer hybrid state space model interwoven with transformer MLP layers.

suhara

NVIDIA org 2 days ago

Thank you for your interest and your support!

There's an ongoing discussion & work for Nemotron-H support for GGUF/Llama.cpp. Please join the discussion & effort. Thank you!
https://github.com/ggml-org/llama.cpp/issues/15409

1 day ago

I have it working up to text gen.

Everything else is done up to token generation.

I push the code up sometime today.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment