Thinking Inside Uncertainty: Interest Moment Perception for Diverse Temporal Grounding

Given a language query, temporal grounding task is to localize temporal boundaries of the described event in an untrimmed video. There is a long-standing challenge that multiple moments may be associated with one same video-query pair, termed label uncertainty. However, existing methods struggle to...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems for video technology 2022-10, Vol.32 (10), p.7190-7203
Hauptverfasser:	Zhou, Hao, Zhang, Chongyang, Luo, Yan, Hu, Chuanping, Zhang, Wenjun
Format:	Artikel
Sprache:	eng
Schlagworte:	Annotations Grounding label uncertainty Measurement moment localization Object recognition Optimization Predictive models Queries Query languages Task analysis Temporal grounding Uncertainty
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	7203
container_issue	10
container_start_page	7190
container_title	IEEE transactions on circuits and systems for video technology
container_volume	32
creator	Zhou, Hao Zhang, Chongyang Luo, Yan Hu, Chuanping Zhang, Wenjun
description	Given a language query, temporal grounding task is to localize temporal boundaries of the described event in an untrimmed video. There is a long-standing challenge that multiple moments may be associated with one same video-query pair, termed label uncertainty. However, existing methods struggle to localize diverse moments due to the lack of multi-label annotations. In this paper, we propose a novel Diverse Temporal Grounding framework (DTG) to achieve diverse moment localization with only single-label annotations. By delving into the label uncertainty, we find the diverse moments retrieved tend to involve similar actions/objects, driving us to perceive these interest moments. Specifically, we construct soft multi-label through semantic similarity of multiple video-query pairs. These soft labels reveal whether multiple moments in the intra-videos contain similar verbs/nouns, thereby guiding interest moment generation. Meanwhile, we put forward a diverse moment regression network (DMRNet) to achieve multiple predictions in a single pass, where plausible moments are dynamically picked out from the interest moments for joint optimization. Moreover, we introduce new metrics that better reveal multi-output performance. Extensive experiments conducted on Charades-STA and ActivityNet Captions show that our method achieves state-of-the-art performance in terms of both standard and new metrics.
doi_str_mv	10.1109/TCSVT.2022.3179314
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_9785774</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9785774</ieee_id><sourcerecordid>2721428279</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-d10abad756b4d37ddcf69393cd3171aea266d29403eb201b42a37b43c41c570b3</originalsourceid><addsrcrecordid>eNo9kEFPwzAMhSMEEmPwB-ASiXNH4iRNyw0NGJOGQKLbtUobFzJYOpIOaf-ejk2cbFl-z88fIZecjThn-U0xflsUI2AAI8F1Lrg8IgOuVJYAMHXc90zxJAOuTslZjEvGuMykHpBF8eH8p_PvdOqjs0jnvsbQGee77W0_6zBg7Ohzu0Lf0VcMNa4713ratIHeux8MEWmBq3UbzBedhHbjbe92Tk4a8xXx4lCHZP74UIyfktnLZDq-myU15KpLLGemMlartJJWaGvrJs1FLmrbf8ENGkhTC7lkAitgvJJghK6kqCWvlWaVGJLrve86tN-bPmm5bDfB9ydL0MAlZNDTGBLYb9WhjTFgU66DW5mwLTkrd_zKP37ljl954NeLrvYih4j_glxnSmspfgGAAGzV</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2721428279</pqid></control><display><type>article</type><title>Thinking Inside Uncertainty: Interest Moment Perception for Diverse Temporal Grounding</title><source>IEEE Electronic Library (IEL)</source><creator>Zhou, Hao ; Zhang, Chongyang ; Luo, Yan ; Hu, Chuanping ; Zhang, Wenjun</creator><creatorcontrib>Zhou, Hao ; Zhang, Chongyang ; Luo, Yan ; Hu, Chuanping ; Zhang, Wenjun</creatorcontrib><description>Given a language query, temporal grounding task is to localize temporal boundaries of the described event in an untrimmed video. There is a long-standing challenge that multiple moments may be associated with one same video-query pair, termed label uncertainty. However, existing methods struggle to localize diverse moments due to the lack of multi-label annotations. In this paper, we propose a novel Diverse Temporal Grounding framework (DTG) to achieve diverse moment localization with only single-label annotations. By delving into the label uncertainty, we find the diverse moments retrieved tend to involve similar actions/objects, driving us to perceive these interest moments. Specifically, we construct soft multi-label through semantic similarity of multiple video-query pairs. These soft labels reveal whether multiple moments in the intra-videos contain similar verbs/nouns, thereby guiding interest moment generation. Meanwhile, we put forward a diverse moment regression network (DMRNet) to achieve multiple predictions in a single pass, where plausible moments are dynamically picked out from the interest moments for joint optimization. Moreover, we introduce new metrics that better reveal multi-output performance. Extensive experiments conducted on Charades-STA and ActivityNet Captions show that our method achieves state-of-the-art performance in terms of both standard and new metrics.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2022.3179314</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Annotations ; Grounding ; label uncertainty ; Measurement ; moment localization ; Object recognition ; Optimization ; Predictive models ; Queries ; Query languages ; Task analysis ; Temporal grounding ; Uncertainty</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2022-10, Vol.32 (10), p.7190-7203</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-d10abad756b4d37ddcf69393cd3171aea266d29403eb201b42a37b43c41c570b3</citedby><cites>FETCH-LOGICAL-c295t-d10abad756b4d37ddcf69393cd3171aea266d29403eb201b42a37b43c41c570b3</cites><orcidid>0000-0001-8799-1182 ; 0000-0002-0173-0393 ; 0000-0001-7292-0445 ; 0000-0002-1394-4452</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9785774$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9785774$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhou, Hao</creatorcontrib><creatorcontrib>Zhang, Chongyang</creatorcontrib><creatorcontrib>Luo, Yan</creatorcontrib><creatorcontrib>Hu, Chuanping</creatorcontrib><creatorcontrib>Zhang, Wenjun</creatorcontrib><title>Thinking Inside Uncertainty: Interest Moment Perception for Diverse Temporal Grounding</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>Given a language query, temporal grounding task is to localize temporal boundaries of the described event in an untrimmed video. There is a long-standing challenge that multiple moments may be associated with one same video-query pair, termed label uncertainty. However, existing methods struggle to localize diverse moments due to the lack of multi-label annotations. In this paper, we propose a novel Diverse Temporal Grounding framework (DTG) to achieve diverse moment localization with only single-label annotations. By delving into the label uncertainty, we find the diverse moments retrieved tend to involve similar actions/objects, driving us to perceive these interest moments. Specifically, we construct soft multi-label through semantic similarity of multiple video-query pairs. These soft labels reveal whether multiple moments in the intra-videos contain similar verbs/nouns, thereby guiding interest moment generation. Meanwhile, we put forward a diverse moment regression network (DMRNet) to achieve multiple predictions in a single pass, where plausible moments are dynamically picked out from the interest moments for joint optimization. Moreover, we introduce new metrics that better reveal multi-output performance. Extensive experiments conducted on Charades-STA and ActivityNet Captions show that our method achieves state-of-the-art performance in terms of both standard and new metrics.</description><subject>Annotations</subject><subject>Grounding</subject><subject>label uncertainty</subject><subject>Measurement</subject><subject>moment localization</subject><subject>Object recognition</subject><subject>Optimization</subject><subject>Predictive models</subject><subject>Queries</subject><subject>Query languages</subject><subject>Task analysis</subject><subject>Temporal grounding</subject><subject>Uncertainty</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kEFPwzAMhSMEEmPwB-ASiXNH4iRNyw0NGJOGQKLbtUobFzJYOpIOaf-ejk2cbFl-z88fIZecjThn-U0xflsUI2AAI8F1Lrg8IgOuVJYAMHXc90zxJAOuTslZjEvGuMykHpBF8eH8p_PvdOqjs0jnvsbQGee77W0_6zBg7Ohzu0Lf0VcMNa4713ratIHeux8MEWmBq3UbzBedhHbjbe92Tk4a8xXx4lCHZP74UIyfktnLZDq-myU15KpLLGemMlartJJWaGvrJs1FLmrbf8ENGkhTC7lkAitgvJJghK6kqCWvlWaVGJLrve86tN-bPmm5bDfB9ydL0MAlZNDTGBLYb9WhjTFgU66DW5mwLTkrd_zKP37ljl954NeLrvYih4j_glxnSmspfgGAAGzV</recordid><startdate>20221001</startdate><enddate>20221001</enddate><creator>Zhou, Hao</creator><creator>Zhang, Chongyang</creator><creator>Luo, Yan</creator><creator>Hu, Chuanping</creator><creator>Zhang, Wenjun</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-8799-1182</orcidid><orcidid>https://orcid.org/0000-0002-0173-0393</orcidid><orcidid>https://orcid.org/0000-0001-7292-0445</orcidid><orcidid>https://orcid.org/0000-0002-1394-4452</orcidid></search><sort><creationdate>20221001</creationdate><title>Thinking Inside Uncertainty: Interest Moment Perception for Diverse Temporal Grounding</title><author>Zhou, Hao ; Zhang, Chongyang ; Luo, Yan ; Hu, Chuanping ; Zhang, Wenjun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-d10abad756b4d37ddcf69393cd3171aea266d29403eb201b42a37b43c41c570b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Annotations</topic><topic>Grounding</topic><topic>label uncertainty</topic><topic>Measurement</topic><topic>moment localization</topic><topic>Object recognition</topic><topic>Optimization</topic><topic>Predictive models</topic><topic>Queries</topic><topic>Query languages</topic><topic>Task analysis</topic><topic>Temporal grounding</topic><topic>Uncertainty</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhou, Hao</creatorcontrib><creatorcontrib>Zhang, Chongyang</creatorcontrib><creatorcontrib>Luo, Yan</creatorcontrib><creatorcontrib>Hu, Chuanping</creatorcontrib><creatorcontrib>Zhang, Wenjun</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhou, Hao</au><au>Zhang, Chongyang</au><au>Luo, Yan</au><au>Hu, Chuanping</au><au>Zhang, Wenjun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Thinking Inside Uncertainty: Interest Moment Perception for Diverse Temporal Grounding</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2022-10-01</date><risdate>2022</risdate><volume>32</volume><issue>10</issue><spage>7190</spage><epage>7203</epage><pages>7190-7203</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>Given a language query, temporal grounding task is to localize temporal boundaries of the described event in an untrimmed video. There is a long-standing challenge that multiple moments may be associated with one same video-query pair, termed label uncertainty. However, existing methods struggle to localize diverse moments due to the lack of multi-label annotations. In this paper, we propose a novel Diverse Temporal Grounding framework (DTG) to achieve diverse moment localization with only single-label annotations. By delving into the label uncertainty, we find the diverse moments retrieved tend to involve similar actions/objects, driving us to perceive these interest moments. Specifically, we construct soft multi-label through semantic similarity of multiple video-query pairs. These soft labels reveal whether multiple moments in the intra-videos contain similar verbs/nouns, thereby guiding interest moment generation. Meanwhile, we put forward a diverse moment regression network (DMRNet) to achieve multiple predictions in a single pass, where plausible moments are dynamically picked out from the interest moments for joint optimization. Moreover, we introduce new metrics that better reveal multi-output performance. Extensive experiments conducted on Charades-STA and ActivityNet Captions show that our method achieves state-of-the-art performance in terms of both standard and new metrics.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2022.3179314</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-8799-1182</orcidid><orcidid>https://orcid.org/0000-0002-0173-0393</orcidid><orcidid>https://orcid.org/0000-0001-7292-0445</orcidid><orcidid>https://orcid.org/0000-0002-1394-4452</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1051-8215
ispartof	IEEE transactions on circuits and systems for video technology, 2022-10, Vol.32 (10), p.7190-7203
issn	1051-8215 1558-2205
language	eng
recordid	cdi_ieee_primary_9785774
source	IEEE Electronic Library (IEL)
subjects	Annotations Grounding label uncertainty Measurement moment localization Object recognition Optimization Predictive models Queries Query languages Task analysis Temporal grounding Uncertainty
title	Thinking Inside Uncertainty: Interest Moment Perception for Diverse Temporal Grounding
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-19T19%3A30%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Thinking%20Inside%20Uncertainty:%20Interest%20Moment%20Perception%20for%20Diverse%20Temporal%20Grounding&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Zhou,%20Hao&rft.date=2022-10-01&rft.volume=32&rft.issue=10&rft.spage=7190&rft.epage=7203&rft.pages=7190-7203&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2022.3179314&rft_dat=%3Cproquest_RIE%3E2721428279%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2721428279&rft_id=info:pmid/&rft_ieee_id=9785774&rfr_iscdi=true