DavidAU commited on
Commit
0e7bd63
·
verified ·
1 Parent(s): 9e231ff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -3
README.md CHANGED
@@ -1,3 +1,93 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ <H2>How-To-Set-and-Manage-MOE-Mix-of-Experts-Model-Activation-of-Experts</H2>
6
+
7
+ This document discusses how to set/change the Mixture of Experts in various LLM/AI apps and includes links
8
+ to additional MOE Models, and other helpful resources.
9
+
10
+ ---
11
+
12
+ <h2>LINKS:</h2>
13
+
14
+ ---
15
+
16
+ <H2>Mixture Of Expert Models - including Reasoning/Thinking:</H2>
17
+
18
+ [ https://huggingface.co/collections/DavidAU/d-au-moe-mixture-of-experts-models-see-also-source-coll-67579e54e1a2dd778050b928 ]
19
+
20
+ <h2>Additional:</h2>
21
+
22
+ <B>#1 All Reasoning/Thinking Models - including MOEs - (collection) (GGUF):</b>
23
+
24
+ [ https://huggingface.co/collections/DavidAU/d-au-reasoning-deepseek-models-with-thinking-reasoning-67a41ec81d9df996fd1cdd60 ]
25
+
26
+ <B>#2 All Reasoning/Thinking Models - including MOES - (collection) (Source Code to generation GGUF, EXL2, AWQ, GPTQ, HQQ, etc etc and direct usage):</b>
27
+
28
+ [ https://huggingface.co/collections/DavidAU/d-au-reasoning-source-files-for-gguf-exl2-awq-gptq-67b296c5f09f3b49a6aa2704 ]
29
+
30
+ <B>#3 All Adapters (collection) - Turn a "regular" model into a "thinking/reasoning" model:</b>
31
+
32
+ [ https://huggingface.co/collections/DavidAU/d-au-reasoning-adapters-loras-any-model-to-reasoning-67bdb1a7156a97f6ec42ce36 ]
33
+
34
+ These collections will update over time. Newest items are usually at the bottom of each collection.
35
+
36
+ ---
37
+
38
+ <H2>Main Document - Setting Mixture Of Experts in LLM/AI apps</H2>
39
+
40
+ ---
41
+
42
+ <B>Experts Activation / Models used to build this model:</B>
43
+
44
+ Special Thanks to all the model makers, for the models used in this MOE Model:
45
+
46
+ To be updated...
47
+
48
+ The mixture of experts is set at 4 experts, but you can use 1, 2, 3, or 4.
49
+
50
+ This "team" has a Captain (first listed model), and then all the team members contribute to the to "token"
51
+ choice billions of times per second. Note the Captain also contributes too.
52
+
53
+ Think of 2, 3 or 4 (or more) master chefs in the kitchen all competing to make the best dish for you.
54
+
55
+ This results in higher quality generation.
56
+
57
+ This also results in many cases in higher quality instruction following too.
58
+
59
+ That means the power of every model is available during instruction and output generation.
60
+
61
+ NOTE:
62
+
63
+ You can use one "expert" too ; however this means the model will randomly select an expert to use EACH TIME, resulting
64
+ in very different generation for each prompt / regen of a prompt.
65
+
66
+ CHANGING THE NUMBER OF EXPERTS:
67
+
68
+ You can set the number of experts in LMStudio (https://lmstudio.ai) at the "load" screen and via other apps/llm apps by setting "Experts" or "Number of Experts".
69
+
70
+ For Text-Generation-Webui (https://github.com/oobabooga/text-generation-webui) you set the number of experts at the loading screen page.
71
+
72
+ For KolboldCPP (https://github.com/LostRuins/koboldcpp) Version 1.8+ , on the load screen, click on "TOKENS",
73
+ you can set experts on this page, and the launch the model.
74
+
75
+ For server.exe / Llama-server.exe (Llamacpp - https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md )
76
+ add the following to the command line to start the "llamacpp server" (CLI):
77
+
78
+ "--override-kv llama.expert_used_count=int:3"
79
+
80
+ (no quotes, where "3" is the number of experts to use)
81
+
82
+ When using "API", you set the "num_experts_used" in the JSON payload (this maybe different for different back ends).
83
+
84
+ <B>SUGGESTION:</B>
85
+
86
+ The MOE models at my repo:
87
+
88
+ [ https://huggingface.co/collections/DavidAU/d-au-moe-mixture-of-experts-models-see-also-source-coll-67579e54e1a2dd778050b928 ]
89
+
90
+ Contain various examples, including example generation(s) showing 2, 4, and 8 experts.
91
+
92
+ This will give you a better idea of what changes to expect when adjusting the number of experts
93
+ and the effect on generation.