MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents Paper • 2509.06477 • Published Sep 8 • 2
UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning Paper • 2509.11543 • Published Sep 15 • 47
UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning Paper • 2509.11543 • Published Sep 15 • 47
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization Paper • 2503.23733 • Published Mar 31 • 10
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections Paper • 2205.12005 • Published May 24, 2022
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation Paper • 2506.04614 • Published Jun 5 • 19
mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model Paper • 2311.18248 • Published Nov 30, 2023
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding Paper • 2409.03420 • Published Sep 5, 2024 • 26
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality Paper • 2304.14178 • Published Apr 27, 2023 • 3
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model Paper • 2310.05126 • Published Oct 8, 2023 • 1