Instructions to use naver-hyperclovax/HyperCLOVAX-SEED-Think-32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use naver-hyperclovax/HyperCLOVAX-SEED-Think-32B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="naver-hyperclovax/HyperCLOVAX-SEED-Think-32B", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("naver-hyperclovax/HyperCLOVAX-SEED-Think-32B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use naver-hyperclovax/HyperCLOVAX-SEED-Think-32B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "naver-hyperclovax/HyperCLOVAX-SEED-Think-32B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B
- SGLang
How to use naver-hyperclovax/HyperCLOVAX-SEED-Think-32B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "naver-hyperclovax/HyperCLOVAX-SEED-Think-32B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "naver-hyperclovax/HyperCLOVAX-SEED-Think-32B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "naver-hyperclovax/HyperCLOVAX-SEED-Think-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use naver-hyperclovax/HyperCLOVAX-SEED-Think-32B with Docker Model Runner:
docker model run hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B
Can I run this with vLLM?
Can I run this with vLLM?
Also, I have A100 80gb x 2.
In requirement section, I need 4 of those, but as the model is 32B, I guess one A100 80gb might be fine.
https://github.com/vllm-project/vllm/pull/31471
I am currently working on it :)
and I guess you can probably run this model with A100 80gb x 2.
If you use Omniserve, the vision encoder and the LLM run as separate services, so you need to cap vLLM’s GPU memory usage. By default vLLM will try to use almost all available GPU memory, so on 2× A100 80GB I’d run the 32B model with tensor parallelism 2 and set --gpu-memory-utilization to around 0.7. That way you leave some headroom on each GPU and can still run the vision encoder on the same GPUs.
(https://github.com/NAVER-Cloud-HyperCLOVA-X/OmniServe)
https://github.com/vllm-project/vllm/pull/31471
I am currently working on it :)
and I guess you can probably run this model with A100 80gb x 2.
What about Nvidia Spark - Docker - vLLM?
As long as you have more than 80gb vram, it should be fine I guess :)
https://github.com/vllm-project/vllm/pull/31471
I've been working on adding support for text input and image+text input for this model. It hasn't been reviewed yet, so I'm not sure when it will be merged, but if you need it, feel free to download vLLM based on this PR and use it.
If you find any bugs, please let me know and I'll fix them right away!