Update README.md
Browse files
README.md
CHANGED
|
@@ -74,7 +74,7 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
|
| 74 |
|
| 75 |
To handle extensive inputs exceeding 32,768 tokens, we utilize [YARN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts.
|
| 76 |
|
| 77 |
-
For deployment, we recommend using vLLM. You can enable long-context capabilities
|
| 78 |
|
| 79 |
1. **Install vLLM**: Ensure you have the latest version from the main branch of [vLLM](https://github.com/vllm-project/vllm).
|
| 80 |
|
|
|
|
| 74 |
|
| 75 |
To handle extensive inputs exceeding 32,768 tokens, we utilize [YARN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts.
|
| 76 |
|
| 77 |
+
For deployment, we recommend using vLLM. You can enable the long-context capabilities by following these steps:
|
| 78 |
|
| 79 |
1. **Install vLLM**: Ensure you have the latest version from the main branch of [vLLM](https://github.com/vllm-project/vllm).
|
| 80 |
|