Minor fixes in example code snippets and chat template description
Browse files
README.md
CHANGED
@@ -385,7 +385,7 @@ git clone https://huggingface.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2
|
|
385 |
|
386 |
vllm serve nvidia/NVIDIA-Nemotron-Nano-9B-v2 \
|
387 |
--trust-remote-code \
|
388 |
-
--mamba_ssm_cache_dtype float32
|
389 |
--enable-auto-tool-choice \
|
390 |
--tool-parser-plugin "NVIDIA-Nemotron-Nano-9B-v2/nemotron_toolcall_parser_no_streaming.py" \
|
391 |
--tool-call-parser "nemotron_json"
|
@@ -479,7 +479,7 @@ Okay, let's see. The user has a bill of $100 and wants to know the amount for an
|
|
479 |
|
480 |
## Prompt Format
|
481 |
|
482 |
-
We follow the jinja chat template provided below. This template conditionally adds `<think>\n` to the start of the Assistant response if `/think` is found in the system prompt or
|
483 |
|
484 |
```
|
485 |
{%- set ns = namespace(enable_thinking = true) %}
|
|
|
385 |
|
386 |
vllm serve nvidia/NVIDIA-Nemotron-Nano-9B-v2 \
|
387 |
--trust-remote-code \
|
388 |
+
--mamba_ssm_cache_dtype float32 \
|
389 |
--enable-auto-tool-choice \
|
390 |
--tool-parser-plugin "NVIDIA-Nemotron-Nano-9B-v2/nemotron_toolcall_parser_no_streaming.py" \
|
391 |
--tool-call-parser "nemotron_json"
|
|
|
479 |
|
480 |
## Prompt Format
|
481 |
|
482 |
+
We follow the jinja chat template provided below. This template conditionally adds `<think>\n` to the start of the Assistant response if `/think` is found in either the system prompt or any user message. If no reasoning signal is added, the model defaults to reasoning "on" mode. The chat template adds `<think></think>` to the start of the Assistant response if `/no_think` is found in the system prompt. Thus enforcing reasoning on/off behavior.
|
483 |
|
484 |
```
|
485 |
{%- set ns = namespace(enable_thinking = true) %}
|