Improve model card: Add pipeline tag and library name (#1)
Browse files- Improve model card: Add pipeline tag and library name (5a1ca879fcdc3b9123421f457a7f74a6253201e6)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
|
@@ -1,6 +1,9 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
| 4 |
### UI-Venus
|
| 5 |
This repository contains the UI-Venus model from the report [UI-Venus: Building High-performance UI Agents with RFT](https://arxiv.org/abs/2508.10833). UI-Venus is a native UI agent based on the Qwen2.5-VL multimodal large language model, designed to perform precise GUI element grounding and effective navigation using only screenshots as input. It achieves state-of-the-art performance through Reinforcement Fine-Tuning (RFT) with high-quality training data. More inference details and usage guides are available in the GitHub repository. We will continue to update results on standard benchmarks including Screenspot-v2/Pro and AndroidWorld.
|
| 6 |
|
|
@@ -34,12 +37,8 @@ Key innovations include:
|
|
| 34 |
- **Efficient Data Cleaning**: Trained on several hundred thousand high-quality samples to ensure robustness.
|
| 35 |
- **Self-Evolving Trajectory History Alignment & Sparse Action Enhancement**: Improves reasoning coherence and action distribution for better long-horizon planning.
|
| 36 |
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
---
|
| 42 |
-
##
|
| 43 |
|
| 44 |
First, install the required dependencies:
|
| 45 |
|
|
@@ -48,9 +47,7 @@ pip install transformers==4.49.0 qwen-vl-utils
|
|
| 48 |
```
|
| 49 |
---
|
| 50 |
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
## Quick Start
|
| 54 |
```python
|
| 55 |
from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
|
| 56 |
from typing import Dict, Tuple, Any
|
|
@@ -227,7 +224,6 @@ This is the compressed package of validation trajectories for **AndroidWorld**,
|
|
| 227 |
|
| 228 |
> **Table:** Performance comparison on **AndroidWorld** for end-to-end models. Our UI-Venus-Navi-72B achieves state-of-the-art performance, outperforming all baseline methods across different settings.
|
| 229 |
|
| 230 |
-
|
| 231 |
### Results on AndroidControl and GUI-Odyssey
|
| 232 |
|
| 233 |
| Models | AndroidControl-Low<br>Type Acc. | AndroidControl-Low<br>Step SR | AndroidControl-High<br>Type Acc. | AndroidControl-High<br>Step SR | GUI-Odyssey<br>Type Acc. | GUI-Odyssey<br>Step SR |
|
|
@@ -253,7 +249,6 @@ This is the compressed package of validation trajectories for **AndroidWorld**,
|
|
| 253 |
|
| 254 |
> **Table:** Performance comparison on offline UI navigation datasets including AndroidControl and GUI-Odyssey. Note that models with * are reproduced.
|
| 255 |
|
| 256 |
-
|
| 257 |
# Citation
|
| 258 |
Please consider citing if you find our work useful:
|
| 259 |
```plain
|
|
@@ -266,4 +261,4 @@ Please consider citing if you find our work useful:
|
|
| 266 |
primaryClass={cs.CV},
|
| 267 |
url={https://arxiv.org/abs/2508.10833},
|
| 268 |
}
|
| 269 |
-
```
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
pipeline_tag: image-text-to-text
|
| 4 |
+
library_name: transformers
|
| 5 |
---
|
| 6 |
+
|
| 7 |
### UI-Venus
|
| 8 |
This repository contains the UI-Venus model from the report [UI-Venus: Building High-performance UI Agents with RFT](https://arxiv.org/abs/2508.10833). UI-Venus is a native UI agent based on the Qwen2.5-VL multimodal large language model, designed to perform precise GUI element grounding and effective navigation using only screenshots as input. It achieves state-of-the-art performance through Reinforcement Fine-Tuning (RFT) with high-quality training data. More inference details and usage guides are available in the GitHub repository. We will continue to update results on standard benchmarks including Screenspot-v2/Pro and AndroidWorld.
|
| 9 |
|
|
|
|
| 37 |
- **Efficient Data Cleaning**: Trained on several hundred thousand high-quality samples to ensure robustness.
|
| 38 |
- **Self-Evolving Trajectory History Alignment & Sparse Action Enhancement**: Improves reasoning coherence and action distribution for better long-horizon planning.
|
| 39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
---
|
| 41 |
+
## Installation
|
| 42 |
|
| 43 |
First, install the required dependencies:
|
| 44 |
|
|
|
|
| 47 |
```
|
| 48 |
---
|
| 49 |
|
| 50 |
+
## Quick Start
|
|
|
|
|
|
|
| 51 |
```python
|
| 52 |
from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
|
| 53 |
from typing import Dict, Tuple, Any
|
|
|
|
| 224 |
|
| 225 |
> **Table:** Performance comparison on **AndroidWorld** for end-to-end models. Our UI-Venus-Navi-72B achieves state-of-the-art performance, outperforming all baseline methods across different settings.
|
| 226 |
|
|
|
|
| 227 |
### Results on AndroidControl and GUI-Odyssey
|
| 228 |
|
| 229 |
| Models | AndroidControl-Low<br>Type Acc. | AndroidControl-Low<br>Step SR | AndroidControl-High<br>Type Acc. | AndroidControl-High<br>Step SR | GUI-Odyssey<br>Type Acc. | GUI-Odyssey<br>Step SR |
|
|
|
|
| 249 |
|
| 250 |
> **Table:** Performance comparison on offline UI navigation datasets including AndroidControl and GUI-Odyssey. Note that models with * are reproduced.
|
| 251 |
|
|
|
|
| 252 |
# Citation
|
| 253 |
Please consider citing if you find our work useful:
|
| 254 |
```plain
|
|
|
|
| 261 |
primaryClass={cs.CV},
|
| 262 |
url={https://arxiv.org/abs/2508.10833},
|
| 263 |
}
|
| 264 |
+
```
|