getting error on vllm

#12
by maroahma - opened

(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] self.model = GptOssModel(
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] ^^^^^^^^^^^^
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] File "/home/ec2-user/.pyenv/versions/gpt-oss2/lib/python3.12/site-packages/vllm/compilation/decorators.py", line 183, in init
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] File "/home/ec2-user/.pyenv/versions/gpt-oss2/lib/python3.12/site-packages/vllm/model_executor/models/gpt_oss.py", line 214, in init
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] TransformerBlock(
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] File "/home/ec2-user/.pyenv/versions/gpt-oss2/lib/python3.12/site-packages/vllm/model_executor/models/gpt_oss.py", line 183, in init
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] self.attn = OAIAttention(config, prefix=f"{prefix}.attn")
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] File "/home/ec2-user/.pyenv/versions/gpt-oss2/lib/python3.12/site-packages/vllm/model_executor/models/gpt_oss.py", line 110, in init
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] self.attn = Attention(
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] ^^^^^^^^^^
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] File "/home/ec2-user/.pyenv/versions/gpt-oss2/lib/python3.12/site-packages/vllm/attention/layer.py", line 176, in init
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] self.impl = impl_cls(num_heads, head_size, scale, num_kv_heads,
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] File "/home/ec2-user/.pyenv/versions/gpt-oss2/lib/python3.12/site-packages/vllm/v1/attention/backends/flash_attn.py", line 417, in init
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] assert self.vllm_flash_attn_version == 3, (
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] AssertionError: Sinks are only supported in FlashAttention 3

Downloading the model now so I havn't gotten there yet, but according to the docs:

VLLM_USE_TRITON_FLASH_ATTN=1 # Flag to control if you wantAI Inference Server to use Triton Flash Attention.
VLLM_FLASH_ATTN_VERSION=1 # Force AI Inference Server to use a specific flash-attention version (2 or 3), only valid with the flash-attention backend.

Citation: RedHat

Edit: Apologies, this was not correct. Did you see the Github issue re: compiling a wheel to make this work? Are you doing that?

It seems to required Flash Attention 3 due to the sink logic. Trying to use a different version doesn't work and I don't think Flash Attention 3 works on all hardware.

Okay so I can't run it on A10G or L40s ?

(APIServer pid=86857) INFO 08-06 11:07:44 [api_server.py:1787] vLLM API server version 0.10.2.dev2+gf5635d62e.d20250806
(APIServer pid=86857) INFO 08-06 11:07:44 [utils.py:326] non-default args: {'model_tag': 'openai/gpt-oss-20b', 'port': 8284, 'model': 'openai/gpt-oss-20b', 'max_model_len': 65536, 'gpu_memory_utilization': 0.85, 'enable_prefix_caching': True}
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] Error in inspecting model architecture 'GptOssForCausalLM'
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] Traceback (most recent call last):
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/model_executor/models/registry.py", line 825, in _run_in_subprocess
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] returned.check_returncode()
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/subprocess.py", line 502, in check_returncode
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] raise CalledProcessError(self.returncode, self.args, self.stdout,
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] subprocess.CalledProcessError: Command '['/home/ariq/anaconda3/envs/gptoss/bin/python3', '-m', 'vllm.model_executor.models.registry']' returned non-zero exit status 1.
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415]
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] The above exception was the direct cause of the following exception:
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415]
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] Traceback (most recent call last):
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/model_executor/models/registry.py", line 413, in _try_inspect_model_cls
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] return model.inspect_model_cls()
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] ^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/model_executor/models/registry.py", line 384, in inspect_model_cls
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] return _run_in_subprocess(
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] ^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/model_executor/models/registry.py", line 828, in _run_in_subprocess
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] raise RuntimeError(f"Error raised in subprocess:\n"
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] RuntimeError: Error raised in subprocess:
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] :128: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models', but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] Traceback (most recent call last):
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "", line 198, in _run_module_as_main
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "", line 88, in _run_code
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/model_executor/models/registry.py", line 849, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] _run()
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/model_executor/models/registry.py", line 842, in _run
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] result = fn()
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] ^^^^
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/model_executor/models/registry.py", line 385, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] lambda: _ModelInfo.from_model_cls(self.load_model_cls()))
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/model_executor/models/registry.py", line 388, in load_model_cls
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] mod = importlib.import_module(self.module_name)
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/importlib/init.py", line 90, in import_module
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] return _bootstrap._gcd_import(name[level:], package, level)
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "", line 1387, in _gcd_import
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "", line 1360, in _find_and_load
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "", line 1331, in _find_and_load_unlocked
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "", line 935, in _load_unlocked
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "", line 999, in exec_module
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "", line 488, in _call_with_frames_removed
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/model_executor/models/gpt_oss.py", line 22, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] from vllm.model_executor.layers.rotary_embedding import get_rope
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/model_executor/layers/rotary_embedding.py", line 39, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] from vllm.vllm_flash_attn.layers.rotary import apply_rotary_emb
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/vllm_flash_attn/layers/rotary.py", line 10, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] from ..ops.triton.rotary import apply_rotary # modified from original
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/vllm_flash_attn/ops/triton/rotary.py", line 8, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] import triton
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/triton/init.py", line 8, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] from .runtime import (
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/triton/runtime/init.py", line 1, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] from .autotuner import (Autotuner, Config, Heuristics, autotune, heuristics)
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/triton/runtime/autotuner.py", line 12, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] from .jit import KernelInterface, JITFunction
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/triton/runtime/jit.py", line 17, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] from .driver import driver
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/triton/runtime/driver.py", line 3, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] from ..backends import backends, DriverBase
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/triton/backends/init.py", line 47, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] backends: dict[str, Backend] = _discover_backends()
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/triton/backends/init.py", line 40, in _discover_backends
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] compiler = importlib.import_module(f"{ep.value}.compiler")
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/importlib/init.py", line 90, in import_module
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] return _bootstrap._gcd_import(name[level:], package, level)
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/triton/backends/amd/compiler.py", line 2, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] from triton._C.libtriton import ir, passes, llvm, amd
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] ImportError: /home/ariq/anaconda3/envs/gptoss/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/triton/_C/libtriton.so)
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415]
(APIServer pid=86857) Traceback (most recent call last):
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/bin/vllm", line 10, in
(APIServer pid=86857) sys.exit(main())
(APIServer pid=86857) ^^^^^^
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 54, in main
(APIServer pid=86857) args.dispatch_function(args)
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 50, in cmd
(APIServer pid=86857) uvloop.run(run_server(args))
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/uvloop/init.py", line 109, in run
(APIServer pid=86857) return __asyncio.run(
(APIServer pid=86857) ^^^^^^^^^^^^^^
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=86857) return runner.run(main)
(APIServer pid=86857) ^^^^^^^^^^^^^^^^
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=86857) return self._loop.run_until_complete(task)
(APIServer pid=86857) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/uvloop/init.py", line 61, in wrapper
(APIServer pid=86857) return await main
(APIServer pid=86857) ^^^^^^^^^^
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1827, in run_server
(APIServer pid=86857) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1847, in run_server_worker
(APIServer pid=86857) async with build_async_engine_client(
(APIServer pid=86857) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=86857) return await anext(self.gen)
(APIServer pid=86857) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 167, in build_async_engine_client
(APIServer pid=86857) async with build_async_engine_client_from_engine_args(
(APIServer pid=86857) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=86857) return await anext(self.gen)
(APIServer pid=86857) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 193, in build_async_engine_client_from_engine_args
(APIServer pid=86857) vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=86857) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 1043, in create_engine_config
(APIServer pid=86857) model_config = self.create_model_config()
(APIServer pid=86857) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 889, in create_model_config
(APIServer pid=86857) return ModelConfig(
(APIServer pid=86857) ^^^^^^^^^^^^
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/pydantic/_internal/_dataclasses.py", line 127, in init
(APIServer pid=86857) s.pydantic_validator.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=86857) pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
(APIServer pid=86857) Value error, Model architectures ['GptOssForCausalLM'] failed to be inspected. Please check the logs for more details. [type=value_error, input_value=ArgsKwargs((), {'model': ...attention_dtype': None}), input_type=ArgsKwargs]
(APIServer pid=86857) For further information visit https://errors.pydantic.dev/2.12/v/value_error

i had been following installation instruction provided, but it keeps error. any solution

You should follow the official guide and use uv instead of pip.
This fixed it for me.

Same issue.

I am also using A10G and hit the issue

someone opened a bug on vllm, please fix this issue most of the AWS EC2 machine are kind of useless to run oss-20b

Same issue.

finally work using vLLM API server version (v0.10.2.dev2+gf5635d62e.d20250807) on Azure A100 VM

$ VLLM_ATTENTION_BACKEND=TRITON_ATTN_VLLM_V1 vllm serve openai/gpt-oss-120b

Sign up or log in to comment