Text Generation
English
How to use reasoning models.
How to use thinking models.
How to create reasoninng models.
deepseek
reasoning
reason
thinking
all use cases
creative
fiction writing
plot generation
sub-plot generation
story generation
scene continue
storytelling
fiction story
romance
all genres
story
writing
vivid writing
fiction
roleplaying
bfloat16
float32
float16
role play
sillytavern
backyard
lmstudio
Text Generation WebUI
llama 3
mistral
llama 3.1
qwen 2.5
context 128k
mergekit
Merge
File size: 17,456 Bytes
8ed2a73 df15e27 8ed2a73 8630094 df15e27 8630094 8ed2a73 8630094 5c98690 8630094 5c98690 8630094 5c98690 8630094 5c98690 8630094 8ed2a73 5c98690 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 |
---
license: apache-2.0
language:
- en
tags:
- How to use reasoning models.
- How to use thinking models.
- How to create reasoninng models.
- deepseek
- reasoning
- reason
- thinking
- all use cases
- creative
- fiction writing
- plot generation
- sub-plot generation
- fiction writing
- story generation
- scene continue
- storytelling
- fiction story
- romance
- all genres
- story
- writing
- vivid writing
- fiction
- roleplaying
- bfloat16
- float32
- float16
- role play
- sillytavern
- backyard
- lmstudio
- Text Generation WebUI
- llama 3
- mistral
- llama 3.1
- qwen 2.5
- context 128k
- mergekit
- merge
pipeline_tag: text-generation
---
<h2>How-To-Use-Reasoning-Thinking-Models-and-Create-Them - DOCUMENT</h2>
This document covers suggestions and methods to get the most out of "Reasoning/Thinking" models, including parameters/samplers,
System Prompt/Role settings, as well as links to "Reasoning/Thinking Models" and How to create your own (via adapters).
This is a live document and updates will occur often.
This document and the information contained in it can be used for ANY "Reasoning/Thinking" model - at my repo and/or other repos.
LINKS:
All Reasoning/Thinking Models - including MOEs - (collection) (GGUF):
[ https://huggingface.co/collections/DavidAU/d-au-reasoning-deepseek-models-with-thinking-reasoning-67a41ec81d9df996fd1cdd60 ]
All Reasoning/Thinking Models - including MOES - (collection) (Source Code to generation GGUF, EXL2, AWQ, GPTQ, HQQ, etc etc and direct usage):
[ https://huggingface.co/collections/DavidAU/d-au-reasoning-source-files-for-gguf-exl2-awq-gptq-67b296c5f09f3b49a6aa2704 ]
All Adapters (collection) - Turn a "regular" model into a "thinking/reasoning" model:
[ https://huggingface.co/collections/DavidAU/d-au-reasoning-adapters-loras-any-model-to-reasoning-67bdb1a7156a97f6ec42ce36 ]
These collections will update over time. Newest items are usually at the bottom of each collection.
---
<B>Support: Document about Parameters, Samplers and How to Set These:</b>
---
For additional generational support, general questions, and detailed parameter info and a lot more see also:
https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters
---
<B>Support: AI Auto-Correct Engine (software patch for SillyTavern Front End)</b>
---
AI Auto-Correct Engine (built, and programmed by DavidAU) auto-corrects AI generation in real-time, including modification of the
live generation stream to and from the AI... creating a two way street of information that operates, changes, and edits automatically.
This system is for all GGUF, EXL2, HQQ, and other quants/compressions and full source models too.
Below is an example generation using a standard GGUF (and standard AI app), but auto-corrected via this engine.
The engine is an API level system.
Software Link:
https://huggingface.co/DavidAU/AI_Autocorrect__Auto-Creative-Enhancement__Auto-Low-Quant-Optimization__gguf-exl2-hqq-SOFTWARE
---
<h2>MAIN: How To Use Reasoning / Thinking Models 101 </h2>
<B>Special Operation Instructions:</B>
---
<B>Template Considerations:</b>
For most reasoning/thinking models your template CHOICE is critical, as well as your System Prompt/Role setting(s) - below.
For most models you will need: Llama 3 Instruct or Chat, Chatml and/or Command-R OR standard "Jinja Autoloaded Template"
(this is contained in the quant and will autoload in SOME AI Apps).
The last one is usually the BEST CHOICE for a reasoning / thinking model (and in many cases other models too).
In Lmstudio, this option appears in the lower left, "template to use -> Manual or "Jinja Template".
This option/setting it will vary from AI/LLM app.
A "Jinja" template is usually in the model's "source code" / "full precision" version and located usually in "tokenizer_config.json" file
(usually the very BOTTOM/END of the file) which is then "copied" to the GGUF quants and available to "AI/LLM" apps.
Here is a Qwen 2.5 version example (DO NOT USE: I have added spacing/breaks for readablity):
<pre>
<small>
"chat_template": "{% if not add_generation_prompt is defined %}
{% set add_generation_prompt = false %}
{% endif %}
{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}
{%- for message in messages %}
{%- if message['role'] == 'system' %}
{% set ns.system_prompt = message['content'] %}
{%- endif %}
{%- endfor %}
{{bos_token}}
{{ns.system_prompt}}
{%- for message in messages %}
{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}
{{'<|User|>' + message['content']}}
{%- endif %}
{%- if message['role'] == 'assistant' and message['content'] is none %}
{%- set ns.is_tool = false -%}
{%- for tool in message['tool_calls']%}
{%- if not ns.is_first %}
{{'<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\\n'
+ '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<|tool▁call▁end|>'}}
{%- set ns.is_first = true -%}
{%- else %}
{{'\\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>'
+ tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n'
+ '```' + '<|tool▁call▁end|>'}}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}
{%- endif %}
{%- endfor %}
{%- endif %}
{%- if message['role'] == 'assistant' and message['content'] is not none %}
{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}
{%- set ns.is_tool = false -%}
{%- else %}
{% set content = message['content'] %}
{% if '</think>' in content %}
{% set content = content.split('</think>')[-1] %}
{% endif %}
{{'<|Assistant|>' + content + '<|end▁of▁sentence|>'}}
{%- endif %}{%- endif %}
{%- if message['role'] == 'tool' %}
{%- set ns.is_tool = true -%}
{%- if ns.is_output_first %}
{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}
{%- set ns.is_output_first = false %}
{%- else %}
{{'\\n<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}
{%- endif %}
{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}
{% endif %}
{% if add_generation_prompt and not ns.is_tool %}
{{'<|Assistant|>'}}
{% endif %}"
</small>
</pre>
In some cases you may need to set a "tokenizer" too - depending on the LLM/AI app - to work with specific reasoning/thinking models. Usually
this is NOT an issue as this is auto-detected/set, but if you are getting strange results then this might be the cause.
Additional Section "General Notes" is at the end of this document.
TEMP/SETTINGS:
1. Set Temp between 0 and .8, higher than this "think" functions will activate differently. The most "stable" temp seems to be .6, with a variance of +-0.05. Lower for more "logic" reasoning, raise it for more "creative" reasoning (max .8 or so). Also set context to at least 4096, to account for "thoughts" generation.
2. For temps 1+,2+ etc etc, thought(s) will expand, and become deeper and richer.
3. Set "repeat penalty" to 1.02 to 1.07 (recommended) .
PROMPTS:
1. If you enter a prompt without implied "step by step" requirements (ie: Generate a scene, write a story, give me 6 plots for xyz), "thinking" (one or more) MAY activate AFTER first generation. (IE: Generate a scene -> scene will generate, followed by suggestions for improvement in "thoughts")
2. If you enter a prompt where "thinking" is stated or implied (ie puzzle, riddle, solve this, brainstorm this idea etc), "thoughts" process(es) in Deepseek will activate almost immediately. Sometimes you need to regen it to activate.
3. You will also get a lot of variations - some will continue the generation, others will talk about how to improve it, and some (ie generation of a scene) will cause the characters to "reason" about this situation. In some cases, the model will ask you to continue generation / thoughts too.
4. In some cases the model's "thoughts" may appear in the generation itself.
5. State the word size length max IN THE PROMPT for best results, especially for activation of "thinking." (see examples below)
6. Sometimes the "censorship" (from Deepseek) will activate, regen the prompt to clear it.
7. You may want to try your prompt once at "default" or "safe" temp settings, another at temp 1.2, and a third at 2.5 as an example. This will give you a broad range of "reasoning/thoughts/problem" solving.
GENERATION - THOUGHTS/REASONING:
1. It may take one or more regens for "thinking" to "activate." (depending on the prompt)
2. Model can generate a LOT of "thoughts". Sometimes the most interesting ones are 3,4,5 or more levels deep.
3. Many times the "thoughts" are unique and very different from one another.
4. Temp/rep pen settings can affect reasoning/thoughts too.
5. Change up or add directives/instructions or increase the detail level(s) in your prompt to improve reasoning/thinking.
6. Adding to your prompt: "think outside the box", "brainstorm X number of ideas", "focus on the most uncommon approaches" can drastically improve your results.
GENERAL SUGGESTIONS:
1. I have found opening a "new chat" per prompt works best with "thinking/reasoning activation", with temp .6, rep pen 1.05 ... THEN "regen" as required.
2. Sometimes the model will really really get completely unhinged and you need to manually stop it.
3. Depending on your AI app, "thoughts" may appear with "< THINK >" and "</ THINK >" tags AND/OR the AI will generate "thoughts" directly in the main output or later output(s).
4. Although quant q4KM was used for testing/examples, higher quants will provide better generation / more sound "reasoning/thinking".
ADDITIONAL SUPPORT:
For additional generational support, general questions, and detailed parameter info and a lot more see also:
https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters
---
<B>Recommended Settings (all) - For usage with "Think" / "Reasoning":</B>
temp: .6 , rep pen: 1.07 (range : 1.02 to 1.12), rep pen range: 64, top_k: 40, top_p: .95, min_p: .05
Temp of 1+, 2+, 3+ will result in much deeper, richer and "more interesting" thoughts and reasoning.
Model behaviour may change with other parameter(s) and/or sampler(s) activated - especially the "thinking/reasoning" process.
---
<B>System Role / System Prompt - Augment The Model's Power:</b>
---
If you set / have a system prompt this will affect both "generation" and "thinking/reasoning".
SIMPLE:
This is the generic system prompt used for generation and testing:
<PRE>
You are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability.
</PRE>
This System Role/Prompt will give you "basic thinking/reasoning":
<PRE>
You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.
</PRE>
ADVANCED:
Logical and Creative - these will SIGNFICANTLY alter the output, and many times improve it too.
This will also cause more thoughts, deeper thoughts, and in many cases more detailed/stronger thoughts too.
Keep in mind you may also want to test the model with NO system prompt at all - including the default one.
Special Credit to: Eric Hartford, Cognitivecomputations ; these are based on his work.
CRITICAL:
Copy and paste exactly as shown, preserve formatting and line breaks.
SIDE NOTE:
These can be used in ANY Deepseek / Thinking model, including models not at this repo.
These, if used in a "non thinking" model, will also alter model performance too.
<PRE>
You are an AI assistant developed by the world wide community of ai experts.
Your primary directive is to provide well-reasoned, structured, and extensively detailed responses.
Formatting Requirements:
1. Always structure your replies using: <think>{reasoning}</think>{answer}
2. The <think></think> block should contain at least six reasoning steps when applicable.
3. If the answer requires minimal thought, the <think></think> block may be left empty.
4. The user does not see the <think></think> section. Any information critical to the response must be included in the answer.
5. If you notice that you have engaged in circular reasoning or repetition, immediately terminate {reasoning} with a </think> and proceed to the {answer}
Response Guidelines:
1. Detailed and Structured: Use rich Markdown formatting for clarity and readability.
2. Scientific and Logical Approach: Your explanations should reflect the depth and precision of the greatest scientific minds.
3. Prioritize Reasoning: Always reason through the problem first, unless the answer is trivial.
4. Concise yet Complete: Ensure responses are informative, yet to the point without unnecessary elaboration.
5. Maintain a professional, intelligent, and analytical tone in all interactions.
</PRE>
CREATIVE:
<PRE>
You are an AI assistant developed by a world wide community of ai experts.
Your primary directive is to provide highly creative, well-reasoned, structured, and extensively detailed responses.
Formatting Requirements:
1. Always structure your replies using: <think>{reasoning}</think>{answer}
2. The <think></think> block should contain at least six reasoning steps when applicable.
3. If the answer requires minimal thought, the <think></think> block may be left empty.
4. The user does not see the <think></think> section. Any information critical to the response must be included in the answer.
5. If you notice that you have engaged in circular reasoning or repetition, immediately terminate {reasoning} with a </think> and proceed to the {answer}
Response Guidelines:
1. Detailed and Structured: Use rich Markdown formatting for clarity and readability.
2. Creative and Logical Approach: Your explanations should reflect the depth and precision of the greatest creative minds first.
3. Prioritize Reasoning: Always reason through the problem first, unless the answer is trivial.
4. Concise yet Complete: Ensure responses are informative, yet to the point without unnecessary elaboration.
5. Maintain a professional, intelligent, and analytical tone in all interactions.
</PRE>
---
<B>General Notes:</b>
These are general notes that have been collected from my various repos and/or from various experiences with both specific models
and all models.
These notes may assist you with other model(s) operation(s).
---
From :
https://huggingface.co/DavidAU/L3.1-MOE-2X8B-Deepseek-DeepHermes-e32-uncensored-abliterated-13.7B-gguf
Due to how this model is configured, I suggest 2-4 generations depending on your use case(s) as each will vary widely in terms of context, thinking/reasoning and response.
Likewise, again depending on how your prompt is worded, it may take 1-4 regens for "thinking" to engage, however sometimes the model will generate a response, then think/reason and improve on this response and continue again. This is in part from "Deepseek" parts in the model.
If you raise temp over .9, you may want to consider 4+ generations.
Note on "reasoning/thinking" this will activate depending on the wording in your prompt(s) and also temp selected.
There can also be variations because of how the models interact per generation.
Also, as general note:
If you are getting "long winded" generation/thinking/reasoning you may want to breakdown the "problem(s)" to solve into one or more prompts. This will allow the model to focus more strongly, and in some case give far better answers.
IE:
If you ask it to generate 6 general plots for a story VS generate one plot with these specific requirements - you may get better results.
---
From :
https://huggingface.co/DavidAU/Qwen2.5-MOE-6x1.5B-DeepSeek-Reasoning-e32-gguf
Temp of .4 to .8 is suggested, however it will still operate at much higher temps like 1.8, 2.6 etc.
Depending on your prompt change temp SLOWLY: IE: .41,.42,.43 ... etc etc.
Likewise, because these are small models, it may do a tonne of "thinking"/"reasoning" and then "forget" to finish a / the task(s). In this case, prompt the model to "Complete the task XYZ with the 'reasoning plan' above" .
Likewise it may function better if you breakdown the reasoning/thinking task(s) into smaller pieces :
"IE: Instead of asking for 6 plots FOR theme XYZ, ASK IT for ONE plot for theme XYZ at a time".
Also set context limit at 4k minimum, 8K+ suggested.
---
|