Mobile User Interface Element Detection Via Adaptively Prompt Tuning

Recent object detection approaches rely on pretrained vision-language models for image-text alignment. However, they fail to detect the Mobile User Interface (MUI) element since it contains additional OCR information, which describes its content and function but is often ignored. In this paper, we d...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Gu, Zhangxuan, Xu, Zhuoer, Chen, Haoxing, Lan, Jun, Meng, Changhua, Wang, Weiqiang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Gu, Zhangxuan
Xu, Zhuoer
Chen, Haoxing
Lan, Jun
Meng, Changhua
Wang, Weiqiang
description Recent object detection approaches rely on pretrained vision-language models for image-text alignment. However, they fail to detect the Mobile User Interface (MUI) element since it contains additional OCR information, which describes its content and function but is often ignored. In this paper, we develop a new MUI element detection dataset named MUI-zh and propose an Adaptively Prompt Tuning (APT) module to take advantage of discriminating OCR information. APT is a lightweight and effective module to jointly optimize category prompts across different modalities. For every element, APT uniformly encodes its visual features and OCR descriptions to dynamically adjust the representation of frozen category prompts. We evaluate the effectiveness of our plug-and-play APT upon several existing CLIP-based detectors for both standard and open-vocabulary MUI element detection. Extensive experiments show that our method achieves considerable improvements on two datasets. The datasets is available at \url{github.com/antmachineintelligence/MUI-zh}.
doi_str_mv 10.48550/arxiv.2305.09699
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2305_09699</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2305_09699</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-1ab6f1e518ec1dcf94a6bf5ced5a1f1c1a9d255dc1edc6dc69ca7509047fa873</originalsourceid><addsrcrecordid>eNotz11LwzAUxvHceCFzH8Ar8wXa5axN2lyOvehgouD0tpwmJxJo05Jlw317tyk88L974MfYI4i8rKUUM4w__pTPCyFzoZXW92z1OrS-I_55oMi3IVF0aIivO-opJL6iRCb5IfAvj3xhcUz-RN2Zv8ehHxPfH4MP3w_szmF3oOl_J-xjs94vX7Ld2_N2udhlqCqdAbbKAUmoyYA1TpeoWicNWYngwABqO5fSGiBr1GXaYCWFFmXlsK6KCXv6e70pmjH6HuO5uWqam6b4BUsXRhs</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Mobile User Interface Element Detection Via Adaptively Prompt Tuning</title><source>arXiv.org</source><creator>Gu, Zhangxuan ; Xu, Zhuoer ; Chen, Haoxing ; Lan, Jun ; Meng, Changhua ; Wang, Weiqiang</creator><creatorcontrib>Gu, Zhangxuan ; Xu, Zhuoer ; Chen, Haoxing ; Lan, Jun ; Meng, Changhua ; Wang, Weiqiang</creatorcontrib><description>Recent object detection approaches rely on pretrained vision-language models for image-text alignment. However, they fail to detect the Mobile User Interface (MUI) element since it contains additional OCR information, which describes its content and function but is often ignored. In this paper, we develop a new MUI element detection dataset named MUI-zh and propose an Adaptively Prompt Tuning (APT) module to take advantage of discriminating OCR information. APT is a lightweight and effective module to jointly optimize category prompts across different modalities. For every element, APT uniformly encodes its visual features and OCR descriptions to dynamically adjust the representation of frozen category prompts. We evaluate the effectiveness of our plug-and-play APT upon several existing CLIP-based detectors for both standard and open-vocabulary MUI element detection. Extensive experiments show that our method achieves considerable improvements on two datasets. The datasets is available at \url{github.com/antmachineintelligence/MUI-zh}.</description><identifier>DOI: 10.48550/arxiv.2305.09699</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2023-05</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2305.09699$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2305.09699$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Gu, Zhangxuan</creatorcontrib><creatorcontrib>Xu, Zhuoer</creatorcontrib><creatorcontrib>Chen, Haoxing</creatorcontrib><creatorcontrib>Lan, Jun</creatorcontrib><creatorcontrib>Meng, Changhua</creatorcontrib><creatorcontrib>Wang, Weiqiang</creatorcontrib><title>Mobile User Interface Element Detection Via Adaptively Prompt Tuning</title><description>Recent object detection approaches rely on pretrained vision-language models for image-text alignment. However, they fail to detect the Mobile User Interface (MUI) element since it contains additional OCR information, which describes its content and function but is often ignored. In this paper, we develop a new MUI element detection dataset named MUI-zh and propose an Adaptively Prompt Tuning (APT) module to take advantage of discriminating OCR information. APT is a lightweight and effective module to jointly optimize category prompts across different modalities. For every element, APT uniformly encodes its visual features and OCR descriptions to dynamically adjust the representation of frozen category prompts. We evaluate the effectiveness of our plug-and-play APT upon several existing CLIP-based detectors for both standard and open-vocabulary MUI element detection. Extensive experiments show that our method achieves considerable improvements on two datasets. The datasets is available at \url{github.com/antmachineintelligence/MUI-zh}.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz11LwzAUxvHceCFzH8Ar8wXa5axN2lyOvehgouD0tpwmJxJo05Jlw317tyk88L974MfYI4i8rKUUM4w__pTPCyFzoZXW92z1OrS-I_55oMi3IVF0aIivO-opJL6iRCb5IfAvj3xhcUz-RN2Zv8ehHxPfH4MP3w_szmF3oOl_J-xjs94vX7Ld2_N2udhlqCqdAbbKAUmoyYA1TpeoWicNWYngwABqO5fSGiBr1GXaYCWFFmXlsK6KCXv6e70pmjH6HuO5uWqam6b4BUsXRhs</recordid><startdate>20230516</startdate><enddate>20230516</enddate><creator>Gu, Zhangxuan</creator><creator>Xu, Zhuoer</creator><creator>Chen, Haoxing</creator><creator>Lan, Jun</creator><creator>Meng, Changhua</creator><creator>Wang, Weiqiang</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230516</creationdate><title>Mobile User Interface Element Detection Via Adaptively Prompt Tuning</title><author>Gu, Zhangxuan ; Xu, Zhuoer ; Chen, Haoxing ; Lan, Jun ; Meng, Changhua ; Wang, Weiqiang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-1ab6f1e518ec1dcf94a6bf5ced5a1f1c1a9d255dc1edc6dc69ca7509047fa873</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Gu, Zhangxuan</creatorcontrib><creatorcontrib>Xu, Zhuoer</creatorcontrib><creatorcontrib>Chen, Haoxing</creatorcontrib><creatorcontrib>Lan, Jun</creatorcontrib><creatorcontrib>Meng, Changhua</creatorcontrib><creatorcontrib>Wang, Weiqiang</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gu, Zhangxuan</au><au>Xu, Zhuoer</au><au>Chen, Haoxing</au><au>Lan, Jun</au><au>Meng, Changhua</au><au>Wang, Weiqiang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mobile User Interface Element Detection Via Adaptively Prompt Tuning</atitle><date>2023-05-16</date><risdate>2023</risdate><abstract>Recent object detection approaches rely on pretrained vision-language models for image-text alignment. However, they fail to detect the Mobile User Interface (MUI) element since it contains additional OCR information, which describes its content and function but is often ignored. In this paper, we develop a new MUI element detection dataset named MUI-zh and propose an Adaptively Prompt Tuning (APT) module to take advantage of discriminating OCR information. APT is a lightweight and effective module to jointly optimize category prompts across different modalities. For every element, APT uniformly encodes its visual features and OCR descriptions to dynamically adjust the representation of frozen category prompts. We evaluate the effectiveness of our plug-and-play APT upon several existing CLIP-based detectors for both standard and open-vocabulary MUI element detection. Extensive experiments show that our method achieves considerable improvements on two datasets. The datasets is available at \url{github.com/antmachineintelligence/MUI-zh}.</abstract><doi>10.48550/arxiv.2305.09699</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2305.09699
ispartof
issn
language eng
recordid cdi_arxiv_primary_2305_09699
source arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
title Mobile User Interface Element Detection Via Adaptively Prompt Tuning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T19%3A32%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mobile%20User%20Interface%20Element%20Detection%20Via%20Adaptively%20Prompt%20Tuning&rft.au=Gu,%20Zhangxuan&rft.date=2023-05-16&rft_id=info:doi/10.48550/arxiv.2305.09699&rft_dat=%3Carxiv_GOX%3E2305_09699%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true