VAREdit / README.md

Remove library name (#3)

520ddb6 verified 3 days ago

4.98 kB

	---
	base_model:
	- FoundationVision/Infinity
	language:
	- en
	license: mit
	pipeline_tag: image-to-image
	tags:
	- image-editing
	- HiDream.ai
	---

	# VAREdit: Visual Autoregressive Modeling for Instruction-Guided Image Editing

	[📄 Paper](https://huggingface.co/papers/2508.15772)

	![VAREdit Demo](assets/demo.jpg)

	[VAREdit](https://github.com/HiDream-ai/VAREdit) is an advanced image editing model built on the [Infinity](https://huggingface.co/FoundationVision/infinity) models, designed for high-quality instruction-based image editing.


	Try our online demos: [🤗VAREdit-8B-1024](https://huggingface.co/spaces/HiDream-ai/VAREdit-8B-1024) and [🤗VAREdit-8B-512](https://huggingface.co/spaces/HiDream-ai/VAREdit-8B-512).

	## 🌟 Key Features

	- Strong Instruction Follow: Follows instructions more accurately due to the autoregressive nature of the model.
	- Efficient Inference: Optimized for fast generation with less than 1 seconds for 8B model.
	- Flexible Resolution: Supports 512×512 and 1024×1024 image resolutions
	![VAREdit Demo](assets/framework.jpg)

	## 📊 Model Variants

	\| Model Variant \| Resolutions \| HuggingFace Model \| Time (H800) \| VRAM (GB) \|
	\|:--------------\|:------------\|:---------------------------------------------------------------------------------\|:----------\|:----------\|
	\| VAREdit-8B-512 \| 512×512 \| [VAREdit-8B-512](https://huggingface.co/HiDream-ai/VAREdit) \| ~0.7s \| 50.41 \|
	\| VAREdit-8B-1024 \| 1024×1024 \| [VAREdit-8B-1024](https://huggingface.co/HiDream-ai/VAREdit) \| ~1.99s \| 50.41 \|

	## 🚀 Quick Start

	### Prerequisites

	Before starting, ensure you have:
	- Python 3.8+
	- CUDA-compatible GPU with sufficient VRAM (8GB+ for 2B model, 24GB+ for 8B model)
	- Required dependencies installed

	### Installation

	1. Clone the repository
	```bash
	git clone https://github.com/HiDream-ai/VAREdit.git
	cd VAREdit
	```

	2. Install dependencies
	```bash
	pip install -r requirements.txt
	```

	3. Download model checkpoints

	Download the VAREdit model checkpoints:
	```bash
	# Download from HuggingFace
	git lfs install
	git clone https://huggingface.co/HiDream-ai/VAREdit
	```

	### Basic Usage

	```python
	from infer import load_model, generate_image

	model_components = load_model(
	pretrain_root="HiDream-ai/VAREdit",
	model_path="HiDream-ai/VAREdit/8B-1024.pth",
	model_size="8B",
	image_size=1024
	)

	# Generate edited image
	edited_image = generate_image(
	model_components,
	src_img_path="assets/test.jpg",
	instruction="Add glasses to this girl and change hair color to red",
	cfg=3.0, # Classifier-free guidance scale
	tau=0.1, # Temperature parameter
	seed=42 # Optional random seed
	)
	```

	## 📝 Detailed Configuration

	### Model Sampling Parameters

	\| Parameter \| Description \| Default \|
	\|:----------\|:------------\|:--------\|
	\| `cfg` \| Classifier-free guidance scale \| 3.0 \|
	\| `tau` \| Temperature for sampling \| 0.1 \|
	\| `seed` \| Random seed for reproducibility \| -1 (random) \|

	## 📂 Project Structure

	```
	VAREdit/
	├── infer.py # Main inference script
	├── infinity/ # Core model implementations
	│ ├── models/ # Model architectures
	│ ├── dataset/ # Data processing utilities
	│ └── utils/ # Helper functions
	├── tools/ # Additional tools and scripts
	│ └── run_infinity.py # Model execution utilities
	├── assets/ # Demo images and resources
	└── README.md # This file
	```

	## 📊 Performance Benchmarks
	\| Method \| Size \| EMU-Edit Bal. \| PIE-Bench Bal. \| Time (A800) \|
	\|:---\|:---:\|:---:\|:---:\|:---:\|
	\| InstructPix2Pix \| 1.1B \| 2.923 \| 4.034 \| 3.5s \|
	\| UltraEdit \| 7.7B \| 4.541 \| 5.580 \| 2.6s \|
	\| OmniGen \| 3.8B \| 4.674 \| 3.492 \| 16.5s \|
	\| AnySD \| 2.9B \| 3.129 \| 3.326 \| 3.4s \|
	\| EditAR \| 0.8B \| 3.305 \| 4.707 \| 45.5s \|
	\| ACE++ \| 16.9B \| 2.076 \| 2.574 \| 5.7s \|
	\| ICEdit \| 17.0B \| 4.785 \| 4.933 \| 8.4s \|
	\| VAREdit (256px) \| 2.2B \| 5.565 \| 6.684 \| 0.5s \|
	\| VAREdit (512px) \| 2.2B \| 5.662 \| 6.996 \| 0.7s \|
	\| VAREdit (512px) \| 8.4B \| 7.792 \| 8.105 \| 1.2s \|
	\| VAREdit (1024px) \| 8.4B \| 7.379 \| 7.688 \| 3.9s \|

	Note: The released 8B models are trained longer and on more data, so the performances are better than that in the paper.

	## 📄 License

	This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

	## 📚 Citation

	If you use VAREdit in your research, please cite:

	```bibtex
	@article{varedit2025,
	title={Visual Autoregressive Modeling for Instruction-Guided Image Editing},
	author={Mao, Qingyang and Cai, Qi and Li, Yehao and Pan, Yingwei and Cheng, Mingyue and Yao, Ting and Liu, Qi and Mei, Tao},
	journal={arXiv preprint},
	year={2025}
	}
	```

	## 🙏 Acknowledgments

	- Built on the [Infinity](https://huggingface.co/FoundationVision/infinity) models

	Note: This project is under active development. Features and code may change.