Improve model card: Add pipeline tag, library name, update paper link, and add project page
Browse filesThis PR enhances the model card for GUI-Owl-7B by:
* Adding the `pipeline_tag: image-text-to-text` to improve discoverability on the Hugging Face Hub, as the model is a visual language model capable of multimodal GUI automation.
* Specifying `library_name: transformers` to enable the automated "how to use" code snippet, as the model's `config.json` indicates compatibility with the `transformers` library (e.g., `Qwen2_5_VLForConditionalGeneration`, `Qwen2Tokenizer`, `Qwen2_5_VLProcessor`).
* Updating the "Paper" link to the official Hugging Face Papers page: [Mobile-Agent-v3: Foundamental Agents for GUI Automation](https://huggingface.co/papers/2508.15144).
* Adding a "Project Page" link: [https://osatlas.github.io/](https://osatlas.github.io/), which was identified in the associated GitHub repository's README.
These updates will make the model more accessible, discoverable, and user-friendly for the community.
@@ -1,11 +1,13 @@
|
|
1 |
---
|
2 |
-
license: mit
|
3 |
-
language:
|
4 |
-
- en
|
5 |
base_model:
|
6 |
- Qwen/Qwen2.5-VL-7B-Instruct
|
|
|
|
|
|
|
7 |
tags:
|
8 |
- arxiv:2508.15144
|
|
|
|
|
9 |
---
|
10 |
|
11 |
# GUI-Owl
|
@@ -16,9 +18,10 @@ tags:
|
|
16 |
|
17 |
GUI-Owl is a model series developed as part of the Mobile-Agent-V3 project. It achieves state-of-the-art performance across a range of GUI automation benchmarks, including ScreenSpot-V2, ScreenSpot-Pro, OSWorld-G, MMBench-GUI, Android Control, Android World, and OSWorld. Furthermore, it can be instantiated as various specialized agents within the Mobile-Agent-V3 multi-agent framework to accomplish more complex tasks.
|
18 |
|
19 |
-
*
|
20 |
-
*
|
21 |
-
*
|
|
|
22 |
|
23 |
## Performance
|
24 |
|
@@ -91,4 +94,4 @@ If you find our paper and model useful in your research, feel free to give us a
|
|
91 |
primaryClass={cs.AI},
|
92 |
url={https://arxiv.org/abs/2508.15144},
|
93 |
}
|
94 |
-
```
|
|
|
1 |
---
|
|
|
|
|
|
|
2 |
base_model:
|
3 |
- Qwen/Qwen2.5-VL-7B-Instruct
|
4 |
+
language:
|
5 |
+
- en
|
6 |
+
license: mit
|
7 |
tags:
|
8 |
- arxiv:2508.15144
|
9 |
+
pipeline_tag: image-text-to-text
|
10 |
+
library_name: transformers
|
11 |
---
|
12 |
|
13 |
# GUI-Owl
|
|
|
18 |
|
19 |
GUI-Owl is a model series developed as part of the Mobile-Agent-V3 project. It achieves state-of-the-art performance across a range of GUI automation benchmarks, including ScreenSpot-V2, ScreenSpot-Pro, OSWorld-G, MMBench-GUI, Android Control, Android World, and OSWorld. Furthermore, it can be instantiated as various specialized agents within the Mobile-Agent-V3 multi-agent framework to accomplish more complex tasks.
|
20 |
|
21 |
+
* **Paper**: [Mobile-Agent-v3: Foundamental Agents for GUI Automation](https://huggingface.co/papers/2508.15144)
|
22 |
+
* **Project Page**: [https://osatlas.github.io/](https://osatlas.github.io/)
|
23 |
+
* **GitHub Repository**: https://github.com/X-PLUG/MobileAgent
|
24 |
+
* **Online Demo**: Comming soon
|
25 |
|
26 |
## Performance
|
27 |
|
|
|
94 |
primaryClass={cs.AI},
|
95 |
url={https://arxiv.org/abs/2508.15144},
|
96 |
}
|
97 |
+
```
|