Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,93 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
---
|
4 |
+
|
5 |
+
<H2>How-To-Set-and-Manage-MOE-Mix-of-Experts-Model-Activation-of-Experts</H2>
|
6 |
+
|
7 |
+
This document discusses how to set/change the Mixture of Experts in various LLM/AI apps and includes links
|
8 |
+
to additional MOE Models, and other helpful resources.
|
9 |
+
|
10 |
+
---
|
11 |
+
|
12 |
+
<h2>LINKS:</h2>
|
13 |
+
|
14 |
+
---
|
15 |
+
|
16 |
+
<H2>Mixture Of Expert Models - including Reasoning/Thinking:</H2>
|
17 |
+
|
18 |
+
[ https://huggingface.co/collections/DavidAU/d-au-moe-mixture-of-experts-models-see-also-source-coll-67579e54e1a2dd778050b928 ]
|
19 |
+
|
20 |
+
<h2>Additional:</h2>
|
21 |
+
|
22 |
+
<B>#1 All Reasoning/Thinking Models - including MOEs - (collection) (GGUF):</b>
|
23 |
+
|
24 |
+
[ https://huggingface.co/collections/DavidAU/d-au-reasoning-deepseek-models-with-thinking-reasoning-67a41ec81d9df996fd1cdd60 ]
|
25 |
+
|
26 |
+
<B>#2 All Reasoning/Thinking Models - including MOES - (collection) (Source Code to generation GGUF, EXL2, AWQ, GPTQ, HQQ, etc etc and direct usage):</b>
|
27 |
+
|
28 |
+
[ https://huggingface.co/collections/DavidAU/d-au-reasoning-source-files-for-gguf-exl2-awq-gptq-67b296c5f09f3b49a6aa2704 ]
|
29 |
+
|
30 |
+
<B>#3 All Adapters (collection) - Turn a "regular" model into a "thinking/reasoning" model:</b>
|
31 |
+
|
32 |
+
[ https://huggingface.co/collections/DavidAU/d-au-reasoning-adapters-loras-any-model-to-reasoning-67bdb1a7156a97f6ec42ce36 ]
|
33 |
+
|
34 |
+
These collections will update over time. Newest items are usually at the bottom of each collection.
|
35 |
+
|
36 |
+
---
|
37 |
+
|
38 |
+
<H2>Main Document - Setting Mixture Of Experts in LLM/AI apps</H2>
|
39 |
+
|
40 |
+
---
|
41 |
+
|
42 |
+
<B>Experts Activation / Models used to build this model:</B>
|
43 |
+
|
44 |
+
Special Thanks to all the model makers, for the models used in this MOE Model:
|
45 |
+
|
46 |
+
To be updated...
|
47 |
+
|
48 |
+
The mixture of experts is set at 4 experts, but you can use 1, 2, 3, or 4.
|
49 |
+
|
50 |
+
This "team" has a Captain (first listed model), and then all the team members contribute to the to "token"
|
51 |
+
choice billions of times per second. Note the Captain also contributes too.
|
52 |
+
|
53 |
+
Think of 2, 3 or 4 (or more) master chefs in the kitchen all competing to make the best dish for you.
|
54 |
+
|
55 |
+
This results in higher quality generation.
|
56 |
+
|
57 |
+
This also results in many cases in higher quality instruction following too.
|
58 |
+
|
59 |
+
That means the power of every model is available during instruction and output generation.
|
60 |
+
|
61 |
+
NOTE:
|
62 |
+
|
63 |
+
You can use one "expert" too ; however this means the model will randomly select an expert to use EACH TIME, resulting
|
64 |
+
in very different generation for each prompt / regen of a prompt.
|
65 |
+
|
66 |
+
CHANGING THE NUMBER OF EXPERTS:
|
67 |
+
|
68 |
+
You can set the number of experts in LMStudio (https://lmstudio.ai) at the "load" screen and via other apps/llm apps by setting "Experts" or "Number of Experts".
|
69 |
+
|
70 |
+
For Text-Generation-Webui (https://github.com/oobabooga/text-generation-webui) you set the number of experts at the loading screen page.
|
71 |
+
|
72 |
+
For KolboldCPP (https://github.com/LostRuins/koboldcpp) Version 1.8+ , on the load screen, click on "TOKENS",
|
73 |
+
you can set experts on this page, and the launch the model.
|
74 |
+
|
75 |
+
For server.exe / Llama-server.exe (Llamacpp - https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md )
|
76 |
+
add the following to the command line to start the "llamacpp server" (CLI):
|
77 |
+
|
78 |
+
"--override-kv llama.expert_used_count=int:3"
|
79 |
+
|
80 |
+
(no quotes, where "3" is the number of experts to use)
|
81 |
+
|
82 |
+
When using "API", you set the "num_experts_used" in the JSON payload (this maybe different for different back ends).
|
83 |
+
|
84 |
+
<B>SUGGESTION:</B>
|
85 |
+
|
86 |
+
The MOE models at my repo:
|
87 |
+
|
88 |
+
[ https://huggingface.co/collections/DavidAU/d-au-moe-mixture-of-experts-models-see-also-source-coll-67579e54e1a2dd778050b928 ]
|
89 |
+
|
90 |
+
Contain various examples, including example generation(s) showing 2, 4, and 8 experts.
|
91 |
+
|
92 |
+
This will give you a better idea of what changes to expect when adjusting the number of experts
|
93 |
+
and the effect on generation.
|