Mobile User Interface Element Detection Via Adaptively Prompt Tuning
Recent object detection approaches rely on pretrained vision-language models for image-text alignment. However, they fail to detect the Mobile User Interface (MUI) element since it contains additional OCR information, which describes its content and function but is often ignored. In this paper, we d...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Gu, Zhangxuan Xu, Zhuoer Chen, Haoxing Lan, Jun Meng, Changhua Wang, Weiqiang |
description | Recent object detection approaches rely on pretrained vision-language models
for image-text alignment. However, they fail to detect the Mobile User
Interface (MUI) element since it contains additional OCR information, which
describes its content and function but is often ignored. In this paper, we
develop a new MUI element detection dataset named MUI-zh and propose an
Adaptively Prompt Tuning (APT) module to take advantage of discriminating OCR
information. APT is a lightweight and effective module to jointly optimize
category prompts across different modalities. For every element, APT uniformly
encodes its visual features and OCR descriptions to dynamically adjust the
representation of frozen category prompts. We evaluate the effectiveness of our
plug-and-play APT upon several existing CLIP-based detectors for both standard
and open-vocabulary MUI element detection. Extensive experiments show that our
method achieves considerable improvements on two datasets. The datasets is
available at \url{github.com/antmachineintelligence/MUI-zh}. |
doi_str_mv | 10.48550/arxiv.2305.09699 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2305_09699</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2305_09699</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-1ab6f1e518ec1dcf94a6bf5ced5a1f1c1a9d255dc1edc6dc69ca7509047fa873</originalsourceid><addsrcrecordid>eNotz11LwzAUxvHceCFzH8Ar8wXa5axN2lyOvehgouD0tpwmJxJo05Jlw317tyk88L974MfYI4i8rKUUM4w__pTPCyFzoZXW92z1OrS-I_55oMi3IVF0aIivO-opJL6iRCb5IfAvj3xhcUz-RN2Zv8ehHxPfH4MP3w_szmF3oOl_J-xjs94vX7Ld2_N2udhlqCqdAbbKAUmoyYA1TpeoWicNWYngwABqO5fSGiBr1GXaYCWFFmXlsK6KCXv6e70pmjH6HuO5uWqam6b4BUsXRhs</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Mobile User Interface Element Detection Via Adaptively Prompt Tuning</title><source>arXiv.org</source><creator>Gu, Zhangxuan ; Xu, Zhuoer ; Chen, Haoxing ; Lan, Jun ; Meng, Changhua ; Wang, Weiqiang</creator><creatorcontrib>Gu, Zhangxuan ; Xu, Zhuoer ; Chen, Haoxing ; Lan, Jun ; Meng, Changhua ; Wang, Weiqiang</creatorcontrib><description>Recent object detection approaches rely on pretrained vision-language models
for image-text alignment. However, they fail to detect the Mobile User
Interface (MUI) element since it contains additional OCR information, which
describes its content and function but is often ignored. In this paper, we
develop a new MUI element detection dataset named MUI-zh and propose an
Adaptively Prompt Tuning (APT) module to take advantage of discriminating OCR
information. APT is a lightweight and effective module to jointly optimize
category prompts across different modalities. For every element, APT uniformly
encodes its visual features and OCR descriptions to dynamically adjust the
representation of frozen category prompts. We evaluate the effectiveness of our
plug-and-play APT upon several existing CLIP-based detectors for both standard
and open-vocabulary MUI element detection. Extensive experiments show that our
method achieves considerable improvements on two datasets. The datasets is
available at \url{github.com/antmachineintelligence/MUI-zh}.</description><identifier>DOI: 10.48550/arxiv.2305.09699</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2023-05</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2305.09699$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2305.09699$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Gu, Zhangxuan</creatorcontrib><creatorcontrib>Xu, Zhuoer</creatorcontrib><creatorcontrib>Chen, Haoxing</creatorcontrib><creatorcontrib>Lan, Jun</creatorcontrib><creatorcontrib>Meng, Changhua</creatorcontrib><creatorcontrib>Wang, Weiqiang</creatorcontrib><title>Mobile User Interface Element Detection Via Adaptively Prompt Tuning</title><description>Recent object detection approaches rely on pretrained vision-language models
for image-text alignment. However, they fail to detect the Mobile User
Interface (MUI) element since it contains additional OCR information, which
describes its content and function but is often ignored. In this paper, we
develop a new MUI element detection dataset named MUI-zh and propose an
Adaptively Prompt Tuning (APT) module to take advantage of discriminating OCR
information. APT is a lightweight and effective module to jointly optimize
category prompts across different modalities. For every element, APT uniformly
encodes its visual features and OCR descriptions to dynamically adjust the
representation of frozen category prompts. We evaluate the effectiveness of our
plug-and-play APT upon several existing CLIP-based detectors for both standard
and open-vocabulary MUI element detection. Extensive experiments show that our
method achieves considerable improvements on two datasets. The datasets is
available at \url{github.com/antmachineintelligence/MUI-zh}.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz11LwzAUxvHceCFzH8Ar8wXa5axN2lyOvehgouD0tpwmJxJo05Jlw317tyk88L974MfYI4i8rKUUM4w__pTPCyFzoZXW92z1OrS-I_55oMi3IVF0aIivO-opJL6iRCb5IfAvj3xhcUz-RN2Zv8ehHxPfH4MP3w_szmF3oOl_J-xjs94vX7Ld2_N2udhlqCqdAbbKAUmoyYA1TpeoWicNWYngwABqO5fSGiBr1GXaYCWFFmXlsK6KCXv6e70pmjH6HuO5uWqam6b4BUsXRhs</recordid><startdate>20230516</startdate><enddate>20230516</enddate><creator>Gu, Zhangxuan</creator><creator>Xu, Zhuoer</creator><creator>Chen, Haoxing</creator><creator>Lan, Jun</creator><creator>Meng, Changhua</creator><creator>Wang, Weiqiang</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230516</creationdate><title>Mobile User Interface Element Detection Via Adaptively Prompt Tuning</title><author>Gu, Zhangxuan ; Xu, Zhuoer ; Chen, Haoxing ; Lan, Jun ; Meng, Changhua ; Wang, Weiqiang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-1ab6f1e518ec1dcf94a6bf5ced5a1f1c1a9d255dc1edc6dc69ca7509047fa873</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Gu, Zhangxuan</creatorcontrib><creatorcontrib>Xu, Zhuoer</creatorcontrib><creatorcontrib>Chen, Haoxing</creatorcontrib><creatorcontrib>Lan, Jun</creatorcontrib><creatorcontrib>Meng, Changhua</creatorcontrib><creatorcontrib>Wang, Weiqiang</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Gu, Zhangxuan</au><au>Xu, Zhuoer</au><au>Chen, Haoxing</au><au>Lan, Jun</au><au>Meng, Changhua</au><au>Wang, Weiqiang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mobile User Interface Element Detection Via Adaptively Prompt Tuning</atitle><date>2023-05-16</date><risdate>2023</risdate><abstract>Recent object detection approaches rely on pretrained vision-language models
for image-text alignment. However, they fail to detect the Mobile User
Interface (MUI) element since it contains additional OCR information, which
describes its content and function but is often ignored. In this paper, we
develop a new MUI element detection dataset named MUI-zh and propose an
Adaptively Prompt Tuning (APT) module to take advantage of discriminating OCR
information. APT is a lightweight and effective module to jointly optimize
category prompts across different modalities. For every element, APT uniformly
encodes its visual features and OCR descriptions to dynamically adjust the
representation of frozen category prompts. We evaluate the effectiveness of our
plug-and-play APT upon several existing CLIP-based detectors for both standard
and open-vocabulary MUI element detection. Extensive experiments show that our
method achieves considerable improvements on two datasets. The datasets is
available at \url{github.com/antmachineintelligence/MUI-zh}.</abstract><doi>10.48550/arxiv.2305.09699</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2305.09699 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2305_09699 |
source | arXiv.org |
subjects | Computer Science - Computer Vision and Pattern Recognition |
title | Mobile User Interface Element Detection Via Adaptively Prompt Tuning |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T19%3A32%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mobile%20User%20Interface%20Element%20Detection%20Via%20Adaptively%20Prompt%20Tuning&rft.au=Gu,%20Zhangxuan&rft.date=2023-05-16&rft_id=info:doi/10.48550/arxiv.2305.09699&rft_dat=%3Carxiv_GOX%3E2305_09699%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |