Not able to deploy gpt-oss-20b model in A100s

#124

by saiadityavzure - opened 10 days ago

Discussion

saiadityavzure

10 days ago

Not able to deploy gpt-oss-20b model in A100s (40GB * 2) models.
Any details of how to deploy ?

BhargavPrasad

10 days ago

Hey Community,

I have two A100 GPUs (40GB each) and I’m trying to deploy the GPT OSS 20B model. However, I’m encountering an FA3 error both with NVIDIA NIM and with other providers.

I’ll post the exact error details below for reference. Any guidance, troubleshooting tips, or insights would be greatly appreciated.

Thanks in advance for your support!

saiadityavzure

10 days ago

•

edited 10 days ago

This is the error we are getting in the below screenshot.
We are using VLLM as platform

entfane

3 days ago

@saiadityavzure check out triton kernel attention backend. The issue is that A100 is Ampere architecture which does not support MXFP4 natively. Here is the link from vLLM:
https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html#quickstart

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment