Skip to content

Code for paper "Improved GUI Grounding via Iterative Narrowing"

Notifications You must be signed in to change notification settings

ant-8/GUI-Grounding-via-Iterative-Narrowing

Repository files navigation

arXiv

The code for the paper: Improved GUI Grounding via Iterative Narrowing

Abstract

Graphical User Interface (GUI) grounding plays a crucial role in enhancing the capabilities of Vision-Language Model (VLM) agents. While general VLMs, such as GPT-4V, demonstrate strong performance across various tasks, their proficiency in GUI grounding remains suboptimal. Recent studies have focused on fine-tuning these models specifically for zero-shot GUI grounding, yielding significant improvements over baseline performance. We introduce a visual prompting framework that employs an iterative narrowing mechanism to improve the performance of both general and fine-tuned models. In the case of general models, we observed improvements by up to 61%. For evaluation, we tested our method on a comprehensive benchmark comprising various UI platforms and provided the code to reproduce our results.

Models Baseline IN (n=3)
InternVL-2-4B 4.32 6.53
Qwen2-VL-7B 42.89 69.1
ShowUI-2B 75.1 79.56
OS-Atlas-Base-7B 82.47 83.33

Table 1: Overall average accuracy (%) comparing baseline against our method (IN) on the ScreenSpot benchmark.

ScreenSpot Setup

  1. Create a screenspot/images directory.
  2. Follow the steps from this repository to download SceenSpot images.
  3. Place the images in the recently created directory.

Dependencies

  • pillow
  • torch
  • numpy
  • transformers
  • qwen_vl_utils
  • flash-attn

Citation

@misc{nguyen2024improvedguigroundingiterative,
      title={Improved GUI Grounding via Iterative Narrowing}, 
      author={Anthony Nguyen},
      year={2024},
      eprint={2411.13591},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.13591}, 
}

About

Code for paper "Improved GUI Grounding via Iterative Narrowing"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published