Mixtral architecture yeah?

#18
by gghfez - opened

So looks like Mixtral architecture. 8 Experts, 2 active.

Original commit had "model_type": "mixtral",

Should be easy to get it running in llama.cpp then?

Not sure about this 'residual_moe' parameter though.

The architecture is almost identical to grok-1 but for sure not mixtral. Please follow https://github.com/ggml-org/llama.cpp/issues/15534 regarding llama.cpp support.

Sign up or log in to comment