GitHub - ant-8/GUI-Grounding-via-Iterative-Narrowing: Code for paper "Improved GUI Grounding via Iterative Narrowing"

The code for the paper: Improved GUI Grounding via Iterative Narrowing

Abstract

Graphical User Interface (GUI) grounding plays a crucial role in enhancing the capabilities of Vision-Language Model (VLM) agents. While general VLMs, such as GPT-4V, demonstrate strong performance across various tasks, their proficiency in GUI grounding remains suboptimal. Recent studies have focused on fine-tuning these models specifically for zero-shot GUI grounding, yielding significant improvements over baseline performance. We introduce a visual prompting framework that employs an iterative narrowing mechanism to improve the performance of both general and fine-tuned models. In the case of general models, we observed improvements by up to 61%. For evaluation, we tested our method on a comprehensive benchmark comprising various UI platforms and provided the code to reproduce our results.

Models	Baseline	IN (n=3)
InternVL-2-4B	4.32	6.53
Qwen2-VL-7B	42.89	69.1
ShowUI-2B	75.1	79.56
OS-Atlas-Base-7B	82.47	83.33

Table 1: Overall average accuracy (%) comparing baseline against our method (IN) on the ScreenSpot benchmark.

ScreenSpot Setup

Create a screenspot/images directory.
Follow the steps from this repository to download SceenSpot images.
Place the images in the recently created directory.

Dependencies

pillow
torch
numpy
transformers
qwen_vl_utils
flash-attn

Citation

@misc{nguyen2024improvedguigroundingiterative,
      title={Improved GUI Grounding via Iterative Narrowing}, 
      author={Anthony Nguyen},
      year={2024},
      eprint={2411.13591},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.13591}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
legacy		legacy
screenspot		screenspot
.gitignore		.gitignore
README.md		README.md
eval-intern-vl.ipynb		eval-intern-vl.ipynb
eval-os-atlas-7b.ipynb		eval-os-atlas-7b.ipynb
eval-qwen.ipynb		eval-qwen.ipynb
eval-show-ui.ipynb		eval-show-ui.ipynb
intern_vl_utils.py		intern_vl_utils.py
paper.pdf		paper.pdf
region_traverser.py		region_traverser.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Abstract

ScreenSpot Setup

Dependencies

Citation

About

Releases

Packages

Languages

ant-8/GUI-Grounding-via-Iterative-Narrowing

Folders and files

Latest commit

History

Repository files navigation

Abstract

ScreenSpot Setup

Dependencies

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages