| <h1>ICLR25: Incorporating Visual Correspondence into Diffusion Model for Visual Try-On</h1> | |
| This is the official repository for the | |
| [Paper](*) | |
| "Incorporating Visual Correspondence into Diffusion Model for Visual Try-On" | |
| ## Overview | |
| We novelly propose to explicitly capitalize | |
| on visual correspondence as the prior to tame diffusion process instead of simply | |
| feeding the whole garment into UNet as the appearance reference. | |
| ## Installation | |
| Create a conda environment & Install requirments | |
| ``` | |
| conda create -n SPM-Diff python==3.9.0 | |
| conda activate SPM-Diff | |
| cd SPM-Diff-main | |
| pip install -r requirements.txt | |
| ``` | |
| ## Semantic Point Matching | |
| In SPM, a set of semantic points on the garment are first sampled and matched to the | |
| corresponding points on the target person via local flow warping. Then, these 2D cues are augmented | |
| into 3D-aware cues with depth/normal map, which act as semantic point matching to supervise | |
| diffusion model. | |
| You can directly download the [Semantic Point Feature](*) or follow the instructions in [preprocessing.md](*) to extract the Semantic Point Feature yourself. | |
| ## Dataset | |
| You can download the VITON-HD dataset from [here](https://github.com/xiezhy6/GP-VTON) <br> | |
| For inference, the following dataset structure is required: <br> | |
| ``` | |
| test | |
| |-- image | |
| |-- masked_vton_img | |
| |-- warp-cloth | |
| |-- cloth | |
| |-- cloth_mask | |
| |-- point | |
| ``` | |
| ## Inference | |
| Please download the pre-trained model from [Google Link](*) | |
| ``` | |
| sh inference.sh | |
| ``` | |
| ## Acknowledgement | |
| Thanks the contribution of [LaDI-VTON](https://github.com/miccunifi/ladi-vton) and [GP-VTON](https://github.com/xiezhy6/GP-VTON). | |