Mixtral architecture yeah?
#18
by
gghfez
- opened
So looks like Mixtral architecture. 8 Experts, 2 active.
Original commit had "model_type": "mixtral",
Should be easy to get it running in llama.cpp then?
Not sure about this 'residual_moe' parameter though.
Interesting
The architecture is almost identical to grok-1 but for sure not mixtral. Please follow https://github.com/ggml-org/llama.cpp/issues/15534 regarding llama.cpp support.