Mobile User Interface Element Detection Via Adaptively Prompt Tuning

Recent object detection approaches rely on pretrained vision-language models for image-text alignment. However, they fail to detect the Mobile User Interface (MUI) element since it contains additional OCR information, which describes its content and function but is often ignored. In this paper, we d...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Gu, Zhangxuan, Xu, Zhuoer, Chen, Haoxing, Lan, Jun, Meng, Changhua, Wang, Weiqiang
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Gu, Zhangxuan Xu, Zhuoer Chen, Haoxing Lan, Jun Meng, Changhua Wang, Weiqiang
description	Recent object detection approaches rely on pretrained vision-language models for image-text alignment. However, they fail to detect the Mobile User Interface (MUI) element since it contains additional OCR information, which describes its content and function but is often ignored. In this paper, we develop a new MUI element detection dataset named MUI-zh and propose an Adaptively Prompt Tuning (APT) module to take advantage of discriminating OCR information. APT is a lightweight and effective module to jointly optimize category prompts across different modalities. For every element, APT uniformly encodes its visual features and OCR descriptions to dynamically adjust the representation of frozen category prompts. We evaluate the effectiveness of our plug-and-play APT upon several existing CLIP-based detectors for both standard and open-vocabulary MUI element detection. Extensive experiments show that our method achieves considerable improvements on two datasets. The datasets is available at \url{github.com/antmachineintelligence/MUI-zh}.
doi_str_mv	10.48550/arxiv.2305.09699
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2305_09699</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2305_09699</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-1ab6f1e518ec1dcf94a6bf5ced5a1f1c1a9d255dc1edc6dc69ca7509047fa873</originalsourceid><addsrcrecordid>eNotz11LwzAUxvHceCFzH8Ar8wXa5axN2lyOvehgouD0tpwmJxJo05Jlw317tyk88L974MfYI4i8rKUUM4w__pTPCyFzoZXW92z1OrS-I_55oMi3IVF0aIivO-opJL6iRCb5IfAvj3xhcUz-RN2Zv8ehHxPfH4MP3w_szmF3oOl_J-xjs94vX7Ld2_N2udhlqCqdAbbKAUmoyYA1TpeoWicNWYngwABqO5fSGiBr1GXaYCWFFmXlsK6KCXv6e70pmjH6HuO5uWqam6b4BUsXRhs</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Mobile User Interface Element Detection Via Adaptively Prompt Tuning</title><source>arXiv.org</source><creator>Gu, Zhangxuan ; Xu, Zhuoer ; Chen, Haoxing ; Lan, Jun ; Meng, Changhua ; Wang, Weiqiang</creator><creatorcontrib>Gu, Zhangxuan ; Xu, Zhuoer ; Chen, Haoxing ; Lan, Jun ; Meng, Changhua ; Wang, Weiqiang</creatorcontrib><description>Recent object detection approaches rely on pretrained vision-language models for image-text alignment. However, they fail to detect the Mobile User Interface (MUI) element since it contains additional OCR information, which describes its content and function but is often ignored. In this paper, we develop a new MUI element detection dataset named MUI-zh and propose an Adaptively Prompt Tuning (APT) module to take advantage of discriminating OCR information. APT is a lightweight and effective module to jointly optimize category prompts across different modalities. For every element, APT uniformly encodes its visual features and OCR descriptions to dynamically adjust the representation of frozen category prompts. We evaluate the effectiveness of our plug-and-play APT upon several existing CLIP-based detectors for both standard and open-vocabulary MUI element detection. Extensive experiments show that our method achieves considerable improvements on two datasets. The datasets is available at \url{github.com/antmachineintelligence/MUI-zh}.</description><identifier>DOI: 10.48550/arxiv.2305.09699</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2023-05</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2305.09699$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2305.09699$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Gu, Zhangxuan</creatorcontrib><creatorcontrib>Xu, Zhuoer</creatorcontrib><creatorcontrib>Chen, Haoxing</creatorcontrib><creatorcontrib>Lan, Jun</creatorcontrib><creatorcontrib>Meng, Changhua</creatorcontrib><creatorcontrib>Wang, Weiqiang</creatorcontrib><title>Mobile User Interface Element Detection Via Adaptively Prompt Tuning</title><description>Recent object detection approaches rely on pretrained vision-language models for image-text alignment. However, they fail to detect the Mobile User Interface (MUI) element since it contains additional OCR information, which describes its content and function but is often ignored. In this paper, we develop a new MUI element detection dataset named MUI-zh and propose an Adaptively Prompt Tuning (APT) module to take advantage of discriminating OCR information. APT is a lightweight and effective module to jointly optimize category prompts across different modalities. For every element, APT uniformly encodes its visual features and OCR descriptions to dynamically adjust the representation of frozen category prompts. We evaluate the effectiveness of our plug-and-play APT upon several existing CLIP-based detectors for both standard and open-vocabulary MUI element detection. Extensive experiments show that our method achieves considerable improvements on two datasets. The datasets is available at \url{github.com/antmachineintelligence/MUI-zh}.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz11LwzAUxvHceCFzH8Ar8wXa5axN2lyOvehgouD0tpwmJxJo05Jlw317tyk88L974MfYI4i8rKUUM4w__pTPCyFzoZXW92z1OrS-I_55oMi3IVF0aIivO-opJL6iRCb5IfAvj3xhcUz-RN2Zv8ehHxPfH4MP3w_szmF3oOl_J-xjs94vX7Ld2_N2udhlqCqdAbbKAUmoyYA1TpeoWicNWYngwABqO5fSGiBr1GXaYCWFFmXlsK6KCXv6e70pmjH6HuO5uWqam6b4BUsXRhs</recordid><startdate>20230516</startdate><enddate>20230516</enddate><creator>Gu, Zhangxuan</creator><creator>Xu, Zhuoer</creator><creator>Chen, Haoxing</creator><creator>Lan, Jun</creator><creator>Meng, Changhua</creator><creator>Wang, Weiqiang</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230516</creationdate><title>Mobile User Interface Element Detection Via Adaptively Prompt Tuning</title><author>Gu, Zhangxuan ; Xu, Zhuoer ; Chen, Haoxing ; Lan, Jun ; Meng, Changhua ; Wang, Weiqiang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-1ab6f1e518ec1dcf94a6bf5ced5a1f1c1a9d255dc1edc6dc69ca7509047fa873</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Gu, Zhangxuan</creatorcontrib><creatorcontrib>Xu, Zhuoer</creatorcontrib><creatorcontrib>Chen, Haoxing</creatorcontrib><creatorcontrib>Lan, Jun</creatorcontrib><creatorcontrib>Meng, Changhua</creatorcontrib><creatorcontrib>Wang, Weiqiang</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gu, Zhangxuan</au><au>Xu, Zhuoer</au><au>Chen, Haoxing</au><au>Lan, Jun</au><au>Meng, Changhua</au><au>Wang, Weiqiang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mobile User Interface Element Detection Via Adaptively Prompt Tuning</atitle><date>2023-05-16</date><risdate>2023</risdate><abstract>Recent object detection approaches rely on pretrained vision-language models for image-text alignment. However, they fail to detect the Mobile User Interface (MUI) element since it contains additional OCR information, which describes its content and function but is often ignored. In this paper, we develop a new MUI element detection dataset named MUI-zh and propose an Adaptively Prompt Tuning (APT) module to take advantage of discriminating OCR information. APT is a lightweight and effective module to jointly optimize category prompts across different modalities. For every element, APT uniformly encodes its visual features and OCR descriptions to dynamically adjust the representation of frozen category prompts. We evaluate the effectiveness of our plug-and-play APT upon several existing CLIP-based detectors for both standard and open-vocabulary MUI element detection. Extensive experiments show that our method achieves considerable improvements on two datasets. The datasets is available at \url{github.com/antmachineintelligence/MUI-zh}.</abstract><doi>10.48550/arxiv.2305.09699</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2305.09699
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2305_09699
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	Mobile User Interface Element Detection Via Adaptively Prompt Tuning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T19%3A32%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mobile%20User%20Interface%20Element%20Detection%20Via%20Adaptively%20Prompt%20Tuning&rft.au=Gu,%20Zhangxuan&rft.date=2023-05-16&rft_id=info:doi/10.48550/arxiv.2305.09699&rft_dat=%3Carxiv_GOX%3E2305_09699%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true