Not able to deploy gpt-oss-20b model in A100s

#124
by saiadityavzure - opened

Not able to deploy gpt-oss-20b model in A100s (40GB * 2) models.
Any details of how to deploy ?

Hey Community,

I have two A100 GPUs (40GB each) and I’m trying to deploy the GPT OSS 20B model. However, I’m encountering an FA3 error both with NVIDIA NIM and with other providers.

I’ll post the exact error details below for reference. Any guidance, troubleshooting tips, or insights would be greatly appreciated.

Thanks in advance for your support!

This is the error we are getting in the below screenshot.
We are using VLLM as platform
image.png

@saiadityavzure check out triton kernel attention backend. The issue is that A100 is Ampere architecture which does not support MXFP4 natively. Here is the link from vLLM:
https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html#quickstart

Sign up or log in to comment