DavidAU
/

How-To-Set-and-Manage-MOE-Mix-of-Experts-Model-Activation-of-Experts

Model card Files Files and versions

xet

Community

DavidAU commited on Mar 3

Commit

0e7bd63

verified ·

1 Parent(s): 9e231ff

Update README.md

Browse files

Files changed (1) hide show

README.md +93 -3

README.md CHANGED Viewed

@@ -1,3 +1,93 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+<H2>How-To-Set-and-Manage-MOE-Mix-of-Experts-Model-Activation-of-Experts</H2>
+This document discusses how to set/change the Mixture of Experts in various LLM/AI apps and includes links
+to additional MOE Models, and other helpful resources.
+---
+<h2>LINKS:</h2>
+---
+<H2>Mixture Of Expert Models - including Reasoning/Thinking:</H2>
+[ https://huggingface.co/collections/DavidAU/d-au-moe-mixture-of-experts-models-see-also-source-coll-67579e54e1a2dd778050b928 ]
+<h2>Additional:</h2>
+<B>#1 All Reasoning/Thinking Models - including MOEs - (collection) (GGUF):</b>
+[ https://huggingface.co/collections/DavidAU/d-au-reasoning-deepseek-models-with-thinking-reasoning-67a41ec81d9df996fd1cdd60 ]
+<B>#2 All Reasoning/Thinking Models - including MOES - (collection) (Source Code to generation GGUF, EXL2, AWQ, GPTQ, HQQ, etc etc and direct usage):</b>
+[ https://huggingface.co/collections/DavidAU/d-au-reasoning-source-files-for-gguf-exl2-awq-gptq-67b296c5f09f3b49a6aa2704 ]
+<B>#3 All Adapters (collection) - Turn a "regular" model into a "thinking/reasoning" model:</b>
+[ https://huggingface.co/collections/DavidAU/d-au-reasoning-adapters-loras-any-model-to-reasoning-67bdb1a7156a97f6ec42ce36 ]
+These collections will update over time. Newest items are usually at the bottom of each collection.
+---
+<H2>Main Document - Setting Mixture Of Experts in LLM/AI apps</H2>
+---
+<B>Experts Activation / Models used to build this model:</B>
+Special Thanks to all the model makers, for the models used in this MOE Model:
+To be updated...
+The mixture of experts is set at 4 experts, but you can use 1, 2, 3, or 4.
+This "team" has a Captain (first listed model), and then all the team members contribute to the to "token"
+choice billions of times per second. Note the Captain also contributes too.
+Think of 2, 3 or 4 (or more) master chefs in the kitchen all competing to make the best dish for you.
+This results in higher quality generation.
+This also results in many cases in higher quality instruction following too.
+That means the power of every model is available during instruction and output generation.
+NOTE:
+You can use one "expert" too ; however this means the model will randomly select an expert to use EACH TIME, resulting
+in very different generation for each prompt / regen of a prompt.
+CHANGING THE NUMBER OF EXPERTS:
+You can set the number of experts in LMStudio (https://lmstudio.ai) at the "load" screen and via other apps/llm apps by setting "Experts" or "Number of Experts".
+For Text-Generation-Webui (https://github.com/oobabooga/text-generation-webui)  you set the number of experts at the loading screen page.
+For KolboldCPP (https://github.com/LostRuins/koboldcpp) Version 1.8+ , on the load screen, click on "TOKENS",
+you can set experts on this page, and the launch the model.
+For server.exe / Llama-server.exe (Llamacpp - https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md )
+add the following to the command line to start the "llamacpp server" (CLI):
+"--override-kv llama.expert_used_count=int:3"
+(no quotes, where "3" is the number of experts to use)
+When using "API", you set the "num_experts_used" in the JSON payload (this maybe different for different back ends).
+<B>SUGGESTION:</B>
+The MOE models at my repo:
+[ https://huggingface.co/collections/DavidAU/d-au-moe-mixture-of-experts-models-see-also-source-coll-67579e54e1a2dd778050b928 ]
+Contain various examples, including example generation(s) showing 2, 4, and 8 experts.
+This will give you a better idea of what changes to expect when adjusting the number of experts
+and the effect on generation.