pdufour
/

Qwen2-VL-2B-Instruct-ONNX-Q4-F16

Model card Files Files and versions

pdufour commited on Nov 19, 2024

Commit

1c632d8

·

verified ·

1 Parent(s): 5331cf8

Update EXPORT.md

Files changed (1) hide show

EXPORT.md +4 -1

EXPORT.md CHANGED Viewed

@@ -36,19 +36,22 @@ Exports model part E by running QwenVL_Export_E.py.
 Reduces ONNX model size by removing unnecessary elements for optimized deployment.
 **quantize**
 Quantizes all model parts (A, B, C, D, and E) to optimize size and performance.
 **quantize-%**
 Quantizes a specific model part (% can be A, B, C, D, or E) with targeted configurations.
 **clean-large-files**
 Deletes ONNX files larger than 2GB from the destination directory to retain models that will work for onnx environments.
 **fix-gpu-buffers**
 Applies fixes to GPU buffers in ONNX files for part E to ensure GPU memory compatibility.
 **all**
 Alias for all-in-one to run the full ONNX model preparation pipeline.

 Reduces ONNX model size by removing unnecessary elements for optimized deployment.
 **quantize**
 Quantizes all model parts (A, B, C, D, and E) to optimize size and performance.
 **quantize-%**
 Quantizes a specific model part (% can be A, B, C, D, or E) with targeted configurations.
 **clean-large-files**
 Deletes ONNX files larger than 2GB from the destination directory to retain models that will work for onnx environments.
 **fix-gpu-buffers**
 Applies fixes to GPU buffers in ONNX files for part E to ensure GPU memory compatibility.
 **all**
 Alias for all-in-one to run the full ONNX model preparation pipeline.