getting error on vllm
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] self.model = GptOssModel(
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] ^^^^^^^^^^^^
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] File "/home/ec2-user/.pyenv/versions/gpt-oss2/lib/python3.12/site-packages/vllm/compilation/decorators.py", line 183, in init
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] File "/home/ec2-user/.pyenv/versions/gpt-oss2/lib/python3.12/site-packages/vllm/model_executor/models/gpt_oss.py", line 214, in init
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] TransformerBlock(
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] File "/home/ec2-user/.pyenv/versions/gpt-oss2/lib/python3.12/site-packages/vllm/model_executor/models/gpt_oss.py", line 183, in init
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] self.attn = OAIAttention(config, prefix=f"{prefix}.attn")
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] File "/home/ec2-user/.pyenv/versions/gpt-oss2/lib/python3.12/site-packages/vllm/model_executor/models/gpt_oss.py", line 110, in init
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] self.attn = Attention(
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] ^^^^^^^^^^
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] File "/home/ec2-user/.pyenv/versions/gpt-oss2/lib/python3.12/site-packages/vllm/attention/layer.py", line 176, in init
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] self.impl = impl_cls(num_heads, head_size, scale, num_kv_heads,
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] File "/home/ec2-user/.pyenv/versions/gpt-oss2/lib/python3.12/site-packages/vllm/v1/attention/backends/flash_attn.py", line 417, in init
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] assert self.vllm_flash_attn_version == 3, (
(EngineCore_0 pid=1949496) (VllmWorker TP3 pid=1949511) ERROR 08-05 18:24:55 [multiproc_executor.py:559] AssertionError: Sinks are only supported in FlashAttention 3
Downloading the model now so I havn't gotten there yet, but according to the docs:
VLLM_USE_TRITON_FLASH_ATTN=1 # Flag to control if you wantAI Inference Server to use Triton Flash Attention.VLLM_FLASH_ATTN_VERSION=1 # Force AI Inference Server to use a specific flash-attention version (2 or 3), only valid with the flash-attention backend.
Citation: RedHat
Edit: Apologies, this was not correct. Did you see the Github issue re: compiling a wheel to make this work? Are you doing that?
It seems to required Flash Attention 3 due to the sink logic. Trying to use a different version doesn't work and I don't think Flash Attention 3 works on all hardware.
Okay so I can't run it on A10G or L40s ?
(APIServer pid=86857) INFO 08-06 11:07:44 [api_server.py:1787] vLLM API server version 0.10.2.dev2+gf5635d62e.d20250806
(APIServer pid=86857) INFO 08-06 11:07:44 [utils.py:326] non-default args: {'model_tag': 'openai/gpt-oss-20b', 'port': 8284, 'model': 'openai/gpt-oss-20b', 'max_model_len': 65536, 'gpu_memory_utilization': 0.85, 'enable_prefix_caching': True}
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] Error in inspecting model architecture 'GptOssForCausalLM'
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] Traceback (most recent call last):
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/model_executor/models/registry.py", line 825, in _run_in_subprocess
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] returned.check_returncode()
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/subprocess.py", line 502, in check_returncode
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] raise CalledProcessError(self.returncode, self.args, self.stdout,
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] subprocess.CalledProcessError: Command '['/home/ariq/anaconda3/envs/gptoss/bin/python3', '-m', 'vllm.model_executor.models.registry']' returned non-zero exit status 1.
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415]
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] The above exception was the direct cause of the following exception:
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415]
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] Traceback (most recent call last):
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/model_executor/models/registry.py", line 413, in _try_inspect_model_cls
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] return model.inspect_model_cls()
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] ^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/model_executor/models/registry.py", line 384, in inspect_model_cls
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] return _run_in_subprocess(
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] ^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/model_executor/models/registry.py", line 828, in _run_in_subprocess
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] raise RuntimeError(f"Error raised in subprocess:\n"
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] RuntimeError: Error raised in subprocess:
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] :128: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models', but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] Traceback (most recent call last):
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "", line 198, in _run_module_as_main
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "", line 88, in _run_code
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/model_executor/models/registry.py", line 849, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] _run()
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/model_executor/models/registry.py", line 842, in _run
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] result = fn()
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] ^^^^
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/model_executor/models/registry.py", line 385, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] lambda: _ModelInfo.from_model_cls(self.load_model_cls()))
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/model_executor/models/registry.py", line 388, in load_model_cls
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] mod = importlib.import_module(self.module_name)
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/importlib/init.py", line 90, in import_module
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] return _bootstrap._gcd_import(name[level:], package, level)
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "", line 1387, in _gcd_import
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "", line 1360, in _find_and_load
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "", line 1331, in _find_and_load_unlocked
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "", line 935, in _load_unlocked
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "", line 999, in exec_module
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "", line 488, in _call_with_frames_removed
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/model_executor/models/gpt_oss.py", line 22, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] from vllm.model_executor.layers.rotary_embedding import get_rope
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/model_executor/layers/rotary_embedding.py", line 39, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] from vllm.vllm_flash_attn.layers.rotary import apply_rotary_emb
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/vllm_flash_attn/layers/rotary.py", line 10, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] from ..ops.triton.rotary import apply_rotary # modified from original
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/vllm_flash_attn/ops/triton/rotary.py", line 8, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] import triton
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/triton/init.py", line 8, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] from .runtime import (
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/triton/runtime/init.py", line 1, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] from .autotuner import (Autotuner, Config, Heuristics, autotune, heuristics)
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/triton/runtime/autotuner.py", line 12, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] from .jit import KernelInterface, JITFunction
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/triton/runtime/jit.py", line 17, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] from .driver import driver
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/triton/runtime/driver.py", line 3, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] from ..backends import backends, DriverBase
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/triton/backends/init.py", line 47, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] backends: dict[str, Backend] = _discover_backends()
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/triton/backends/init.py", line 40, in _discover_backends
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] compiler = importlib.import_module(f"{ep.value}.compiler")
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/importlib/init.py", line 90, in import_module
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] return _bootstrap._gcd_import(name[level:], package, level)
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/triton/backends/amd/compiler.py", line 2, in
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] from triton._C.libtriton import ir, passes, llvm, amd
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415] ImportError: /home/ariq/anaconda3/envs/gptoss/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/triton/_C/libtriton.so)
(APIServer pid=86857) ERROR 08-06 11:07:56 [registry.py:415]
(APIServer pid=86857) Traceback (most recent call last):
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/bin/vllm", line 10, in
(APIServer pid=86857) sys.exit(main())
(APIServer pid=86857) ^^^^^^
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 54, in main
(APIServer pid=86857) args.dispatch_function(args)
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 50, in cmd
(APIServer pid=86857) uvloop.run(run_server(args))
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/uvloop/init.py", line 109, in run
(APIServer pid=86857) return __asyncio.run(
(APIServer pid=86857) ^^^^^^^^^^^^^^
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=86857) return runner.run(main)
(APIServer pid=86857) ^^^^^^^^^^^^^^^^
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=86857) return self._loop.run_until_complete(task)
(APIServer pid=86857) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/uvloop/init.py", line 61, in wrapper
(APIServer pid=86857) return await main
(APIServer pid=86857) ^^^^^^^^^^
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1827, in run_server
(APIServer pid=86857) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1847, in run_server_worker
(APIServer pid=86857) async with build_async_engine_client(
(APIServer pid=86857) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=86857) return await anext(self.gen)
(APIServer pid=86857) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 167, in build_async_engine_client
(APIServer pid=86857) async with build_async_engine_client_from_engine_args(
(APIServer pid=86857) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=86857) return await anext(self.gen)
(APIServer pid=86857) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 193, in build_async_engine_client_from_engine_args
(APIServer pid=86857) vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=86857) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 1043, in create_engine_config
(APIServer pid=86857) model_config = self.create_model_config()
(APIServer pid=86857) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 889, in create_model_config
(APIServer pid=86857) return ModelConfig(
(APIServer pid=86857) ^^^^^^^^^^^^
(APIServer pid=86857) File "/home/ariq/anaconda3/envs/gptoss/lib/python3.12/site-packages/pydantic/_internal/_dataclasses.py", line 127, in init
(APIServer pid=86857) s.pydantic_validator.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=86857) pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
(APIServer pid=86857) Value error, Model architectures ['GptOssForCausalLM'] failed to be inspected. Please check the logs for more details. [type=value_error, input_value=ArgsKwargs((), {'model': ...attention_dtype': None}), input_type=ArgsKwargs]
(APIServer pid=86857) For further information visit https://errors.pydantic.dev/2.12/v/value_error
i had been following installation instruction provided, but it keeps error. any solution
Same issue.
I am also using A10G and hit the issue
someone opened a bug on vllm, please fix this issue most of the AWS EC2 machine are kind of useless to run oss-20b
Same issue.
finally work using vLLM API server version (v0.10.2.dev2+gf5635d62e.d20250807) on Azure A100 VM
$ VLLM_ATTENTION_BACKEND=TRITON_ATTN_VLLM_V1 vllm serve openai/gpt-oss-120b