Image resizing for vision encoder onnx export
#75
by
Jrd100
- opened
Hi,
We're trying to export Moondream's vision encoder to ONNX but running into a shape mismatch in the patch_emb layer:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (672x224 and 588x1152).
We already tried (378,378) and (384,384), we get the similar error.
Code snippet:
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5
Can you please clarify:
The correct input image size for the vision encoder?
The patch size used?
The expected flattened patch dimension for patch_emb?
Any required preprocessing steps we might be missing?
This will help us align our preprocessing with the model's architecture.
Thanks