amd
/

Llama-3.1-405B-Instruct-MXFP4-Preview

8-bit precision

Model card Files Files and versions Community

linzhao-amd commited on Jun 27

Commit

adaf803

·

verified ·

1 Parent(s): b2f44dc

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -37,6 +37,9 @@ python3 quantize_quark.py --model_dir "meta-llama/Meta-Llama-3.1-405B-Instruct"
 ```
 # Deployment
 ## Evaluation

 ```
 # Deployment
+### Use with vLLM
+This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend.
 ## Evaluation