Contains the 3 models presented in the paper: DIVE-Doc: Downscaling foundational Image Visual Encoder into hierarchical architecture for DocVQA
Demo of the DIVE-Doc model - VisionDocs@ICCV2025
Create images in seconds. No sign-up, no paywall, no setup.