nvidia
/

NVIDIA-Nemotron-Nano-9B-v2

Text Generation

Model card Files Files and versions Community

suhara commited on 6 days ago

Commit

bd0d6d5

·

verified ·

1 Parent(s): a550406

Update README.md

Files changed (1) hide show

README.md +3 -5

README.md CHANGED Viewed

@@ -248,13 +248,10 @@ print(outputs[0].outputs[0].text)
 ### **Use it with vLLM**
-The snippet below shows how to use this model with vLLM. Use the following [commit](https://github.com/vllm-project/vllm/commit/75531a6c134282f940c86461b3c40996b4136793) and follow these instructions to build and install vLLM in a docker container.
 ```shell
-git clone https://github.com/vllm-project/vllm.git
-cd vllm
-git checkout bf756321c72340466911b64602e88013d0210c1c
-VLLM_USE_PRECOMPILED=1 pip install -e .
 ```
 Now you can run run the server with:
@@ -265,6 +262,7 @@ vllm serve nvidia/NVIDIA-Nemotron-Nano-9B-v2 \
     --mamba_ssm_cache_dtype float32
 ```
 Note: Remember to add \`--mamba\_ssm\_cache\_dtype float32\` for accurate quality. Without this option, the model’s accuracy may degrade.
 #### Using Budget Control with a vLLM Server

 ### **Use it with vLLM**
+The snippet below shows how to use this model with vLLM. Use the latest version of vLLM and follow these instructions to build and install vLLM.
 ```shell
+pip install -U "vllm>=0.10.1"
 ```
 Now you can run run the server with:
     --mamba_ssm_cache_dtype float32
 ```
 Note: Remember to add \`--mamba\_ssm\_cache\_dtype float32\` for accurate quality. Without this option, the model’s accuracy may degrade.
 #### Using Budget Control with a vLLM Server