update model card.
Browse files
README.md
CHANGED
@@ -6,6 +6,11 @@ base_model:
|
|
6 |
|
7 |
# GUI-Actor-2B with Qwen2-VL-2B as backbone VLM
|
8 |
|
|
|
|
|
|
|
|
|
|
|
9 |
| Model Name | Hugging Face Link |
|
10 |
|--------------------------------------------|--------------------------------------------|
|
11 |
| **GUI-Actor-7B-Qwen2-VL** | [🤗 Hugging Face](https://huggingface.co/microsoft/GUI-Actor-7B-Qwen2-VL) |
|
@@ -14,11 +19,6 @@ base_model:
|
|
14 |
| **GUI-Actor-3B-Qwen2.5-VL (coming soon)** | [🤗 Hugging Face](https://huggingface.co/microsoft/GUI-Actor-3B-Qwen2.5-VL) |
|
15 |
| **GUI-Actor-Verifier-2B** | [🤗 Hugging Face](https://huggingface.co/microsoft/GUI-Actor-Verifier-2B) |
|
16 |
|
17 |
-
This model was introduced in the paper [**GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents**](https://aka.ms/GUI-Actor).
|
18 |
-
It is developed based on [Qwen2-VL-2B-Instruct ](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct), augmented by an attention-based action head and finetuned to perform GUI grounding using the dataset [here (coming soon)]().
|
19 |
-
|
20 |
-
For more details on model design and evaluation, please check: [🏠 Project Page](https://aka.ms/GUI-Actor) | [💻 Github Repo](https://github.com/microsoft/GUI-Actor) | [📑 Paper]().
|
21 |
-
|
22 |
## 📊 Performance Comparison on GUI Grounding Benchmarks
|
23 |
Table 1. Main results on ScreenSpot-Pro, ScreenSpot, and ScreenSpot-v2 with **Qwen2-VL** as the backbone. † indicates scores obtained from our own evaluation of the official models on Huggingface.
|
24 |
| Method | Backbone VLM | ScreenSpot-Pro | ScreenSpot | ScreenSpot-v2 |
|
|
|
6 |
|
7 |
# GUI-Actor-2B with Qwen2-VL-2B as backbone VLM
|
8 |
|
9 |
+
This model was introduced in the paper [**GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents**](https://aka.ms/GUI-Actor).
|
10 |
+
It is developed based on [Qwen2-VL-2B-Instruct ](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct), augmented by an attention-based action head and finetuned to perform GUI grounding using the dataset [here (coming soon)]().
|
11 |
+
|
12 |
+
For more details on model design and evaluation, please check: [🏠 Project Page](https://aka.ms/GUI-Actor) | [💻 Github Repo](https://github.com/microsoft/GUI-Actor) | [📑 Paper]().
|
13 |
+
|
14 |
| Model Name | Hugging Face Link |
|
15 |
|--------------------------------------------|--------------------------------------------|
|
16 |
| **GUI-Actor-7B-Qwen2-VL** | [🤗 Hugging Face](https://huggingface.co/microsoft/GUI-Actor-7B-Qwen2-VL) |
|
|
|
19 |
| **GUI-Actor-3B-Qwen2.5-VL (coming soon)** | [🤗 Hugging Face](https://huggingface.co/microsoft/GUI-Actor-3B-Qwen2.5-VL) |
|
20 |
| **GUI-Actor-Verifier-2B** | [🤗 Hugging Face](https://huggingface.co/microsoft/GUI-Actor-Verifier-2B) |
|
21 |
|
|
|
|
|
|
|
|
|
|
|
22 |
## 📊 Performance Comparison on GUI Grounding Benchmarks
|
23 |
Table 1. Main results on ScreenSpot-Pro, ScreenSpot, and ScreenSpot-v2 with **Qwen2-VL** as the backbone. † indicates scores obtained from our own evaluation of the official models on Huggingface.
|
24 |
| Method | Backbone VLM | ScreenSpot-Pro | ScreenSpot | ScreenSpot-v2 |
|