Mobile User Interface Element Detection Via Adaptively Prompt Tuning
Recent object detection approaches rely on pretrained vision-language models for image-text alignment. However, they fail to detect the Mobile User Interface (MUI) element since it contains additional OCR information, which describes its content and function but is often ignored. In this paper, we d...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recent object detection approaches rely on pretrained vision-language models
for image-text alignment. However, they fail to detect the Mobile User
Interface (MUI) element since it contains additional OCR information, which
describes its content and function but is often ignored. In this paper, we
develop a new MUI element detection dataset named MUI-zh and propose an
Adaptively Prompt Tuning (APT) module to take advantage of discriminating OCR
information. APT is a lightweight and effective module to jointly optimize
category prompts across different modalities. For every element, APT uniformly
encodes its visual features and OCR descriptions to dynamically adjust the
representation of frozen category prompts. We evaluate the effectiveness of our
plug-and-play APT upon several existing CLIP-based detectors for both standard
and open-vocabulary MUI element detection. Extensive experiments show that our
method achieves considerable improvements on two datasets. The datasets is
available at \url{github.com/antmachineintelligence/MUI-zh}. |
---|---|
DOI: | 10.48550/arxiv.2305.09699 |