unsloth/gpt-oss-20b-GGUF · ollama load error

17 days ago

Error: 500 Internal Server Error: unable to load model: /models/blobs/sha256-83c4df826f90104f41236f5d6af6405b690c494f80b18c1562d487f9e3da0c1e

shimmyshimmer

Unsloth AI org 17 days ago

Doesn't work in Ollama as of yet, you need to use llama.cpp, LM Studio or llama-server or Docker

underlines

16 days ago

Wait for ollama to update, or use the non-unsloth version from ollama.com

hdnh2006

14 days ago

I can confirm ollama doesn't work and I got the same error:

ollama run hf.co/unsloth/gpt-oss-20b-GGUF:UD-Q4_K_XL
pulling manifest 
pulling 10fe673de12c: 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  11 GB                         
pulling 51468a0fd901: 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 7.4 KB                         
pulling 264230288548: 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  149 B                         
pulling 2d976e69c233: 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  558 B                         
verifying sha256 digest 
writing manifest 
success 
Error: 500 Internal Server Error: unable to load model: /usr/share/ollama/.ollama/models/blobs/sha256-10fe673de12c20b74b8d670a9fdf0fd36b43b0a86ffc04daeb175c0a2b98c4f9

shimmyshimmer

Unsloth AI org 13 days ago

At this moment Ollama doesn't support any GGUFs for gpt-oss

lainedfles

12 days ago

•

edited 12 days ago

@Unsloth AI org

Thanks for all your excellent work! Your quantizations are very useful. I'd love to see some unique custom fine-tunes from your team.

At this moment Ollama doesn't support any GGUFs for gpt-oss

Generally I try to avoid commenting like this but with more than 2.5t of local models in my Ollama OCI registry, the historical features provided by Ollama have been extremely useful to me (I'd even contributed a simple bug fix with my, now deleted, GitHub account). However, the recent behavior of the Ollama team and associated trends are alarming (rug-pull style?):

Focus on "features" like Turbo
Closed-source GUI
Custom implementation for gpt-oss via direct coordination with Open-AI
Lack of contribution to upstream projects
Lack of communication regarding many of these important issues

I'm sure that most of the Ollama team does their best and nobody is perfect. I also understand the need for organizations to fund themselves. But unless this trend changes, I'll be migrating to the next best mature solution (which I don't think yet exists).

The pull-request to watch is 11823. A good description of the problem is provided by Georgi Gerganov with this comment:

Since none of the maintainers here seem to care enough to explain the actual reason for ollama to not support the HF GGUF models, while the root cause is pretty obvious, I will help explain it:

Before the model was released, the ollama devs decided to fork the ggml inference engine in order to implement gpt-oss support (#11672). In the process, they did not coordinate the changes with the upstream maintainers of ggml. As a result, the ollama implementation is not only incompatible with the vast majority of gpt-oss GGUFs that everyone else uses, but is also significantly slower and unoptimized. On the bright side, they were able to announce day-1 support for gpt-oss and get featured in the major announcements on the release day.

Now after the model has been released, the blogs and marketing posts have circled the internet and the dust has settled, it's time for ollama to throw out their ggml fork and copy the upstream implementation (#11823). For a few days, you will struggle and wonder why none of the GGUFs work, wasting your time to figure out what is going on, without any help or even with some wrong information. But none of this matters, because soon the upstream version of ggml will be merged and ollama will once again be fast and compatible.

Hope this helps.

beruto

11 days ago

same issue here, finetuned gpt 20b transformed with unsloth to gguf fails to be run by ollama but it works on the installation in ollama

owao

11 days ago

It's great ggreganov explicited it, that's fresh air and sanity

lainedfles

1 day ago

The Ollama team has updated the llama.cpp integration in version v0.11.5+. The Unsloth quants now work for me™.

shimmyshimmer

Unsloth AI org 1 day ago

The Ollama team has updated the llama.cpp integration in version v0.11.5+. The Unsloth quants now work for me™.

OH fantastic thanks for letting us know!

owao

1 day ago

•

edited 1 day ago

@lainedfles thanks also again for your post above exposing the shameless behavior ollama team has been having for some times now toward llama.cpp, because you motivated me to look AGAIN for an alternative work which I did several times before but couldn't find a drop in solution...
AND I FOUND IT!
I don't know how I could have missed it until now: llama-swap !
I just finished moved all my GGUFs library and now: goodbye ollama!
No more blobs!
And now I'm happier than ever! 🙏
So much I had to share back and really encourage others to jump in too!

The usage is really simple: you have 1 yaml file instead of many modelfiles, you start the server with 1 command same as ollama serve. And at then end I found myself spending far less time managing my library! Instead of [editing a modelfile, then ollama create -f blablabla], now I just save my yaml, restart the llama-swap server and that's done!