Instructions to use nvidia/NVIDIA-Nemotron-Nano-9B-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nvidia/NVIDIA-Nemotron-Nano-9B-v2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="nvidia/NVIDIA-Nemotron-Nano-9B-v2", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("nvidia/NVIDIA-Nemotron-Nano-9B-v2", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("nvidia/NVIDIA-Nemotron-Nano-9B-v2", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use nvidia/NVIDIA-Nemotron-Nano-9B-v2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nvidia/NVIDIA-Nemotron-Nano-9B-v2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nvidia/NVIDIA-Nemotron-Nano-9B-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2
- SGLang
How to use nvidia/NVIDIA-Nemotron-Nano-9B-v2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nvidia/NVIDIA-Nemotron-Nano-9B-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nvidia/NVIDIA-Nemotron-Nano-9B-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nvidia/NVIDIA-Nemotron-Nano-9B-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nvidia/NVIDIA-Nemotron-Nano-9B-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use nvidia/NVIDIA-Nemotron-Nano-9B-v2 with Docker Model Runner:
docker model run hf.co/nvidia/NVIDIA-Nemotron-Nano-9B-v2
Update chat template for adding sys header even when no sys message, and allowing assistant response pre-fixing (#1)
Browse files- Update chat template for adding sys header even when no sys message, and allowing assistant response pre-fixing (47f6723b050726ddae2157b662307e41dfc78166)
- Fixing think/no think check (648fbeed472430d95319f06fb63e2b7a6f6806bd)
Co-authored-by: Ameya Sunil Mahabaleshwarkar <ameyasunilm@users.noreply.huggingface.co>
- tokenizer_config.json +3 -3
tokenizer_config.json
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
{
|
| 2 |
-
"add_bos_token":
|
| 3 |
"add_eos_token": false,
|
| 4 |
"add_prefix_space": false,
|
| 5 |
"added_tokens_decoder": {
|
|
@@ -8005,7 +8005,7 @@
|
|
| 8005 |
}
|
| 8006 |
},
|
| 8007 |
"bos_token": "<s>",
|
| 8008 |
-
"chat_template": "{%- for message in messages %}{%- set content = message['content'] %}{%- if message['role'] == 'system' %}{{- '<SPECIAL_10>System\n'
|
| 8009 |
"clean_up_tokenization_spaces": false,
|
| 8010 |
"eos_token": "<SPECIAL_12>",
|
| 8011 |
"extra_special_tokens": {},
|
|
@@ -8016,4 +8016,4 @@
|
|
| 8016 |
"model_max_length": 1000000000000000019884624838656,
|
| 8017 |
"tokenizer_class": "PreTrainedTokenizer",
|
| 8018 |
"unk_token": "<unk>"
|
| 8019 |
-
}
|
|
|
|
| 1 |
{
|
| 2 |
+
"add_bos_token": false,
|
| 3 |
"add_eos_token": false,
|
| 4 |
"add_prefix_space": false,
|
| 5 |
"added_tokens_decoder": {
|
|
|
|
| 8005 |
}
|
| 8006 |
},
|
| 8007 |
"bos_token": "<s>",
|
| 8008 |
+
"chat_template": "{%- set ns = namespace(enable_thinking=true) %}{%- for message in messages -%}{%- set content = message['content'] -%}{%- if message['role'] == 'user' or message['role'] == 'system' -%}{%- if '/think' in content -%}{%- set ns.enable_thinking = true -%}{%- elif '/no_think' in content -%}{%- set ns.enable_thinking = false -%}{%- endif -%}{%- endif -%}{%- endfor -%}{%- if messages[0]['role'] != 'system' -%}{%- set ns.non_tool_system_content = '' -%}{{- '<SPECIAL_10>System\n' -}}{%- else -%}{%- set ns.non_tool_system_content = messages[0]['content'].replace('/think', '').replace('/no_think', '').strip() -%}{{- '<SPECIAL_10>System\n' + ns.non_tool_system_content }}{%- endif -%}{%- if tools -%}{%- if ns.non_tool_system_content is defined and ns.non_tool_system_content != '' -%}{{- '\n\n' -}}{%- endif -%}{{- 'You can use the following tools to assist the user if required:' -}}{{- '\n<AVAILABLE_TOOLS>[' -}}{%- for tool in tools -%}{{- (tool.function if tool.function is defined else tool) | tojson -}}{{- ', ' if not loop.last else '' -}}{%- endfor -%}{{- ']</AVAILABLE_TOOLS>\n\n' -}}{{- 'If you decide to call any tool(s), use the following format:\n' -}}{{- '<TOOLCALL>[{{\"name\": \"tool_name1\", \"arguments\": \"tool_args1\"}}, ' -}}{{- '{{\"name\": \"tool_name2\", \"arguments\": \"tool_args2\"}}]</TOOLCALL>\n\n' -}}{{- 'The user will execute tool-calls and return responses from tool(s) in this format:\n' -}}{{- '<TOOL_RESPONSE>[{{\"tool_response1\"}}, {{\"tool_response2\"}}]</TOOL_RESPONSE>\n\n' -}}{{- 'Based on the tool responses, you can call additional tools if needed, correct tool calls if any errors are found, or just respond to the user.' -}}{%- endif -%}{{- '\n' -}}{%- set messages = messages[1:] if messages[0]['role'] == 'system' else messages -%}{%- if messages[-1]['role'] == 'assistant' -%}{%- set ns.last_turn_assistant_content = messages[-1]['content'].strip() -%}{%- set messages = messages[:-1] -%}{%- endif -%}{%- for message in messages %}{%- set content = message['content'] %}{%- if message['role'] == 'user' -%}{{- '<SPECIAL_11>User\n' + content.replace('/think', '').replace('/no_think', '').strip() + '\n' }}{%- elif message['role'] == 'tool' -%}{%- if loop.first or (messages[loop.index0 - 1].role != 'tool') -%}{{- '<SPECIAL_11>User\n' + '<TOOL_RESPONSE>[' }}{%- endif -%}{{- message['content'] -}}{{- ', ' if not loop.last and (messages[loop.index0 + 1].role == 'tool') else '' -}}{%- if loop.last or (messages[loop.index0 + 1].role != 'tool') -%}{{- ']</TOOL_RESPONSE>\n' -}}{%- endif -%}{%- elif message['role'] == 'assistant' -%}{%- if '</think>' in content -%}{%- set content = content.split('</think>')[1].strip() %}{%- endif -%}{{- '<SPECIAL_11>Assistant\n' + content.strip() }}{%- if message.tool_calls -%}{%- if content.strip() != '' -%}{{- '\n\n' -}}{%- endif -%}{{- '<TOOLCALL>[' -}}{%- for call in message.tool_calls -%}{%- set fn = call.function if call.function is defined else call -%}{{- '{\"name\": \"' + fn.name + '\", \"arguments\": ' -}}{%- if fn.arguments is string -%}{{- fn.arguments -}}{%- else -%}{{- fn.arguments | tojson -}}{%- endif -%}{{- '}' + (', ' if not loop.last else '') -}}{%- endfor -%}{{- ']</TOOLCALL>' -}}{%- endif -%}{{- '\n<SPECIAL_12>\n' -}}{%- endif -%}{%- endfor -%}{%- if add_generation_prompt -%}{{- '<SPECIAL_11>Assistant\n' -}}{%- if ns.enable_thinking is defined and ns.enable_thinking is false -%}{{- '<think></think>' -}}{%- else -%}{{- '<think>\n' -}}{%- endif -%}{%- if ns.last_turn_assistant_content is defined and ns.last_turn_assistant_content != '' -%}{{- ns.last_turn_assistant_content -}}{%- endif -%}{%- else -%}{%- if ns.last_turn_assistant_content is defined and ns.last_turn_assistant_content != '' -%}{{- '<SPECIAL_11>Assistant\n' -}}{%- if ns.enable_thinking is defined and ns.enable_thinking is false -%}{{- '<think></think>' -}}{%- else -%}{{- '<think>\n' -}}{%- endif -%}{{- ns.last_turn_assistant_content -}}{%- if continue_final_message is defined -%}{%- if continue_final_message is false -%}{{- '\n<SPECIAL_12>\n' -}}{%- endif -%}{%- else -%}{{- '\n<SPECIAL_12>\n' -}}{%- endif -%}{%- endif -%}{%- endif -%}",
|
| 8009 |
"clean_up_tokenization_spaces": false,
|
| 8010 |
"eos_token": "<SPECIAL_12>",
|
| 8011 |
"extra_special_tokens": {},
|
|
|
|
| 8016 |
"model_max_length": 1000000000000000019884624838656,
|
| 8017 |
"tokenizer_class": "PreTrainedTokenizer",
|
| 8018 |
"unk_token": "<unk>"
|
| 8019 |
+
}
|