GUI-Owl

GUI-Owl is a model series developed as part of the Mobile-Agent-V3 project. It achieves state-of-the-art performance across a range of GUI automation benchmarks, including ScreenSpot-V2, ScreenSpot-Pro, OSWorld-G, MMBench-GUI, Android Control, Android World, and OSWorld. Furthermore, it can be instantiated as various specialized agents within the Mobile-Agent-V3 multi-agent framework to accomplish more complex tasks.

Performance

ScreenSpot-V2, ScreenSpot-Pro and OSWorld-G

MMBench-GUI L1, L2 and Android Control

Android World and OSWorld-Verified

Usage

Please refer to our cookbook.

Deploy

We recommand deploy GUI-Owl-32B through vllm

This script has been validated on an A100 with 96 GB of VRAM. If you serve GUI-Owl-32B on an H20-3e, you can set MP_SIZE=1 for faster inference speed.

PIXEL_ARGS='{"min_pixels":3136,"max_pixels":10035200}'
IMAGE_LIMIT_ARGS='image=2'
MP_SIZE=2
MM_KWARGS=(
    --mm-processor-kwargs $PIXEL_ARGS
    --limit-mm-per-prompt $IMAGE_LIMIT_ARGS
)

vllm serve $CKPT \
    --max-model-len 32768 ${MM_KWARGS[@]} \
    --tensor-parallel-size $MP_SIZE \
    --allowed-local-media-path '/' \
    --port 4243

If you want GUI-Owl to recieve more than two images, you could increase IMAGE_LIMIT_ARGS and reduce max_pixels.

For example:

PIXEL_ARGS='{"min_pixels":3136,"max_pixels":3211264}'
IMAGE_LIMIT_ARGS='image=5'
MP_SIZE=2
MM_KWARGS=(
    --mm-processor-kwargs $PIXEL_ARGS
    --limit-mm-per-prompt $IMAGE_LIMIT_ARGS
)

vllm serve $CKPT \
    --max-model-len 32768 ${MM_KWARGS[@]} \
    --tensor-parallel-size $MP_SIZE \
    --allowed-local-media-path '/' \
    --port 4243

Citation

If you find our paper and model useful in your research, feel free to give us a cite.

@misc{ye2025mobileagentv3foundamentalagentsgui,
      title={Mobile-Agent-v3: Foundamental Agents for GUI Automation}, 
      author={Jiabo Ye and Xi Zhang and Haiyang Xu and Haowei Liu and Junyang Wang and Zhaoqing Zhu and Ziwei Zheng and Feiyu Gao and Junjie Cao and Zhengxi Lu and Jitong Liao and Qi Zheng and Fei Huang and Jingren Zhou and Ming Yan},
      year={2025},
      eprint={2508.15144},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2508.15144}, 
}
Downloads last month
35
Safetensors
Model size
33.5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for mPLUG/GUI-Owl-32B

Finetuned
(28)
this model
Quantizations
2 models

Collection including mPLUG/GUI-Owl-32B