Text Generation
Transformers
Safetensors
PyTorch
nemotron_h
nvidia
nemotron-3
latent-moe
mtp
conversational
custom_code
modelopt

Update Documentation to remove A100 support

#10
by ndurkee - opened

Hi there, I’m trying to run the FP8/NVFP4 quants using vLLM on an A100 (Compute Capability 80), but I'm hitting an error stating that CC 89+ is required.

I noticed the documentation mentions A100 support, but it seems this might only apply to the FP16 version.

  • Request: Could you clarify if FP8/NVFP4 is intended to be supported on Ampere (CC 80) via vLLM?
  • Suggestion: If not, it might be worth updating the README to specify the CC 89+ requirement for these specific quants to save other users some troubleshooting time.
  • Question: Are there plans for an FP8(Q4) version compatible with Ampere, or should I stick to bitsandbytes/GPTQ for this hardware?

Thanks for the great work on these models!

Agreed - unclear why the readme clearly states A100 if the model is not compatible.

Sign up or log in to comment

Free AI Image Generator No sign-up. Instant results. Open Now