Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,93 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
<H2>How-To-Set-and-Manage-MOE-Mix-of-Experts-Model-Activation-of-Experts</H2>
|
| 6 |
+
|
| 7 |
+
This document discusses how to set/change the Mixture of Experts in various LLM/AI apps and includes links
|
| 8 |
+
to additional MOE Models, and other helpful resources.
|
| 9 |
+
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
<h2>LINKS:</h2>
|
| 13 |
+
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
<H2>Mixture Of Expert Models - including Reasoning/Thinking:</H2>
|
| 17 |
+
|
| 18 |
+
[ https://huggingface.co/collections/DavidAU/d-au-moe-mixture-of-experts-models-see-also-source-coll-67579e54e1a2dd778050b928 ]
|
| 19 |
+
|
| 20 |
+
<h2>Additional:</h2>
|
| 21 |
+
|
| 22 |
+
<B>#1 All Reasoning/Thinking Models - including MOEs - (collection) (GGUF):</b>
|
| 23 |
+
|
| 24 |
+
[ https://huggingface.co/collections/DavidAU/d-au-reasoning-deepseek-models-with-thinking-reasoning-67a41ec81d9df996fd1cdd60 ]
|
| 25 |
+
|
| 26 |
+
<B>#2 All Reasoning/Thinking Models - including MOES - (collection) (Source Code to generation GGUF, EXL2, AWQ, GPTQ, HQQ, etc etc and direct usage):</b>
|
| 27 |
+
|
| 28 |
+
[ https://huggingface.co/collections/DavidAU/d-au-reasoning-source-files-for-gguf-exl2-awq-gptq-67b296c5f09f3b49a6aa2704 ]
|
| 29 |
+
|
| 30 |
+
<B>#3 All Adapters (collection) - Turn a "regular" model into a "thinking/reasoning" model:</b>
|
| 31 |
+
|
| 32 |
+
[ https://huggingface.co/collections/DavidAU/d-au-reasoning-adapters-loras-any-model-to-reasoning-67bdb1a7156a97f6ec42ce36 ]
|
| 33 |
+
|
| 34 |
+
These collections will update over time. Newest items are usually at the bottom of each collection.
|
| 35 |
+
|
| 36 |
+
---
|
| 37 |
+
|
| 38 |
+
<H2>Main Document - Setting Mixture Of Experts in LLM/AI apps</H2>
|
| 39 |
+
|
| 40 |
+
---
|
| 41 |
+
|
| 42 |
+
<B>Experts Activation / Models used to build this model:</B>
|
| 43 |
+
|
| 44 |
+
Special Thanks to all the model makers, for the models used in this MOE Model:
|
| 45 |
+
|
| 46 |
+
To be updated...
|
| 47 |
+
|
| 48 |
+
The mixture of experts is set at 4 experts, but you can use 1, 2, 3, or 4.
|
| 49 |
+
|
| 50 |
+
This "team" has a Captain (first listed model), and then all the team members contribute to the to "token"
|
| 51 |
+
choice billions of times per second. Note the Captain also contributes too.
|
| 52 |
+
|
| 53 |
+
Think of 2, 3 or 4 (or more) master chefs in the kitchen all competing to make the best dish for you.
|
| 54 |
+
|
| 55 |
+
This results in higher quality generation.
|
| 56 |
+
|
| 57 |
+
This also results in many cases in higher quality instruction following too.
|
| 58 |
+
|
| 59 |
+
That means the power of every model is available during instruction and output generation.
|
| 60 |
+
|
| 61 |
+
NOTE:
|
| 62 |
+
|
| 63 |
+
You can use one "expert" too ; however this means the model will randomly select an expert to use EACH TIME, resulting
|
| 64 |
+
in very different generation for each prompt / regen of a prompt.
|
| 65 |
+
|
| 66 |
+
CHANGING THE NUMBER OF EXPERTS:
|
| 67 |
+
|
| 68 |
+
You can set the number of experts in LMStudio (https://lmstudio.ai) at the "load" screen and via other apps/llm apps by setting "Experts" or "Number of Experts".
|
| 69 |
+
|
| 70 |
+
For Text-Generation-Webui (https://github.com/oobabooga/text-generation-webui) you set the number of experts at the loading screen page.
|
| 71 |
+
|
| 72 |
+
For KolboldCPP (https://github.com/LostRuins/koboldcpp) Version 1.8+ , on the load screen, click on "TOKENS",
|
| 73 |
+
you can set experts on this page, and the launch the model.
|
| 74 |
+
|
| 75 |
+
For server.exe / Llama-server.exe (Llamacpp - https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md )
|
| 76 |
+
add the following to the command line to start the "llamacpp server" (CLI):
|
| 77 |
+
|
| 78 |
+
"--override-kv llama.expert_used_count=int:3"
|
| 79 |
+
|
| 80 |
+
(no quotes, where "3" is the number of experts to use)
|
| 81 |
+
|
| 82 |
+
When using "API", you set the "num_experts_used" in the JSON payload (this maybe different for different back ends).
|
| 83 |
+
|
| 84 |
+
<B>SUGGESTION:</B>
|
| 85 |
+
|
| 86 |
+
The MOE models at my repo:
|
| 87 |
+
|
| 88 |
+
[ https://huggingface.co/collections/DavidAU/d-au-moe-mixture-of-experts-models-see-also-source-coll-67579e54e1a2dd778050b928 ]
|
| 89 |
+
|
| 90 |
+
Contain various examples, including example generation(s) showing 2, 4, and 8 experts.
|
| 91 |
+
|
| 92 |
+
This will give you a better idea of what changes to expect when adjusting the number of experts
|
| 93 |
+
and the effect on generation.
|