Video Object Segmentation Using Kernelized Memory Network With Multiple Kernels

Semi-supervised video object segmentation (VOS) is to predict the segment of a target object in a video when a ground truth segmentation mask for the target is given in the first frame. Recently, space-time memory networks (STM) have received significant attention as a promising approach for semi-su...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence 2023-02, Vol.45 (2), p.2595-2612
Hauptverfasser:	Seong, Hongje, Hyun, Junhyuk, Kim, Euntai
Format:	Artikel
Sprache:	eng
Schlagworte:	Automobiles Correlation gaussian kernel hide-and-seek Image segmentation Kernel Kernels Matching memory network Networks Object segmentation Queries Segmentation Target masking Task analysis Training Video object segmentation
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	2612
container_issue	2
container_start_page	2595
container_title	IEEE transactions on pattern analysis and machine intelligence
container_volume	45
creator	Seong, Hongje Hyun, Junhyuk Kim, Euntai
description	Semi-supervised video object segmentation (VOS) is to predict the segment of a target object in a video when a ground truth segmentation mask for the target is given in the first frame. Recently, space-time memory networks (STM) have received significant attention as a promising approach for semi-supervised VOS. However, an important point has been overlooked in applying STM to VOS: The solution (=STM) is non-local, but the problem (=VOS) is predominantly local. To solve this mismatch between STM and VOS, we propose new VOS networks called kernelized memory network (KMN) and KMN with multiple kernels (KMN^{M} M ). Our networks conduct not only Query-to-Memory matching but also Memory-to-Query matching. In Memory-to-Query matching, a kernel is employed to reduce the degree of non-localness of the STM. In addition, we present a Hide-and-Seek strategy in pre-training to handle occlusions effectively. The proposed networks surpass the state-of-the-art results on standard benchmarks by a significant margin (+4% in \mathcal {J_{M}} JM on DAVIS 2017 test-dev set). The runtimes of our proposed KMN and KMN^{M} M on DAVIS 2016 validation set are 0.12 and 0.13 seconds per frame, respectively, and the two networks have similar computation times to STM.
doi_str_mv	10.1109/TPAMI.2022.3163375
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_pubmed_primary_35353695</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9745367</ieee_id><sourcerecordid>2645857880</sourcerecordid><originalsourceid>FETCH-LOGICAL-c351t-3a44de99a973dd3c9b7d1963391c25271f59414a4f937273dcad40bf0f1a33883</originalsourceid><addsrcrecordid>eNpdkE1PGzEQhi1URALtH6BStVIvXDa1PfZ6fYxQCwhCKjUpR8vZnU2d7kdq7wrBr8eQkENPc3ifdzTzEHLO6IQxqr8tfk5nNxNOOZ8AywCUPCJjpkGnIEF_IGPKMp7mOc9H5DSEDaVMSAonZBRzCZmWYzL_7Urskvlqg0Wf_MJ1g21ve9e1yTK4dp3com-xds9YJjNsOv-U3GP_2Pm_yYPr_ySzoe7dtsY9Fz6S48rWAT_t5xlZ_vi-uLxO7-ZXN5fTu7QAyfoUrBAlam21grKEQq9UyXT8QbOCS65YJbVgwopKg-KRKWwp6KqiFbMAeQ5n5GK3d-u7fwOG3jQuFFjXtsVuCIZnQuZS5TmN6Nf_0E03-DZeZ7jKGCjBJI8U31GF70LwWJmtd431T4ZR86rbvOk2r7rNXncsfdmvHlYNlofKu98IfN4BDhEPsVYixgpeAKxAgoQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2761374152</pqid></control><display><type>article</type><title>Video Object Segmentation Using Kernelized Memory Network With Multiple Kernels</title><source>IEEE Electronic Library (IEL)</source><creator>Seong, Hongje ; Hyun, Junhyuk ; Kim, Euntai</creator><creatorcontrib>Seong, Hongje ; Hyun, Junhyuk ; Kim, Euntai</creatorcontrib><description><![CDATA[Semi-supervised video object segmentation (VOS) is to predict the segment of a target object in a video when a ground truth segmentation mask for the target is given in the first frame. Recently, space-time memory networks (STM) have received significant attention as a promising approach for semi-supervised VOS. However, an important point has been overlooked in applying STM to VOS: The solution (=STM) is non-local, but the problem (=VOS) is predominantly local. To solve this mismatch between STM and VOS, we propose new VOS networks called kernelized memory network (KMN) and KMN with multiple kernels (KMN<inline-formula><tex-math notation="LaTeX">^{M}</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mi>M</mml:mi></mml:msup></mml:math><inline-graphic xlink:href="kim-ieq1-3163375.gif"/> </inline-formula>). Our networks conduct not only Query-to-Memory matching but also Memory-to-Query matching. In Memory-to-Query matching, a kernel is employed to reduce the degree of non-localness of the STM. In addition, we present a Hide-and-Seek strategy in pre-training to handle occlusions effectively. The proposed networks surpass the state-of-the-art results on standard benchmarks by a significant margin (+4% in <inline-formula><tex-math notation="LaTeX">\mathcal {J_{M}}</tex-math> <mml:math><mml:msub><mml:mi mathvariant="script">J</mml:mi><mml:mi mathvariant="script">M</mml:mi></mml:msub></mml:math><inline-graphic xlink:href="kim-ieq2-3163375.gif"/> </inline-formula> on DAVIS 2017 test-dev set). The runtimes of our proposed KMN and KMN<inline-formula><tex-math notation="LaTeX">^{M}</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mi>M</mml:mi></mml:msup></mml:math><inline-graphic xlink:href="kim-ieq3-3163375.gif"/> </inline-formula> on DAVIS 2016 validation set are 0.12 and 0.13 seconds per frame, respectively, and the two networks have similar computation times to STM.]]></description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2022.3163375</identifier><identifier>PMID: 35353695</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Automobiles ; Correlation ; gaussian kernel ; hide-and-seek ; Image segmentation ; Kernel ; Kernels ; Matching ; memory network ; Networks ; Object segmentation ; Queries ; Segmentation ; Target masking ; Task analysis ; Training ; Video object segmentation</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2023-02, Vol.45 (2), p.2595-2612</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c351t-3a44de99a973dd3c9b7d1963391c25271f59414a4f937273dcad40bf0f1a33883</citedby><cites>FETCH-LOGICAL-c351t-3a44de99a973dd3c9b7d1963391c25271f59414a4f937273dcad40bf0f1a33883</cites><orcidid>0000-0002-0975-8390 ; 0000-0001-7221-409X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9745367$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27915,27916,54749</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9745367$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35353695$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Seong, Hongje</creatorcontrib><creatorcontrib>Hyun, Junhyuk</creatorcontrib><creatorcontrib>Kim, Euntai</creatorcontrib><title>Video Object Segmentation Using Kernelized Memory Network With Multiple Kernels</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description><![CDATA[Semi-supervised video object segmentation (VOS) is to predict the segment of a target object in a video when a ground truth segmentation mask for the target is given in the first frame. Recently, space-time memory networks (STM) have received significant attention as a promising approach for semi-supervised VOS. However, an important point has been overlooked in applying STM to VOS: The solution (=STM) is non-local, but the problem (=VOS) is predominantly local. To solve this mismatch between STM and VOS, we propose new VOS networks called kernelized memory network (KMN) and KMN with multiple kernels (KMN<inline-formula><tex-math notation="LaTeX">^{M}</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mi>M</mml:mi></mml:msup></mml:math><inline-graphic xlink:href="kim-ieq1-3163375.gif"/> </inline-formula>). Our networks conduct not only Query-to-Memory matching but also Memory-to-Query matching. In Memory-to-Query matching, a kernel is employed to reduce the degree of non-localness of the STM. In addition, we present a Hide-and-Seek strategy in pre-training to handle occlusions effectively. The proposed networks surpass the state-of-the-art results on standard benchmarks by a significant margin (+4% in <inline-formula><tex-math notation="LaTeX">\mathcal {J_{M}}</tex-math> <mml:math><mml:msub><mml:mi mathvariant="script">J</mml:mi><mml:mi mathvariant="script">M</mml:mi></mml:msub></mml:math><inline-graphic xlink:href="kim-ieq2-3163375.gif"/> </inline-formula> on DAVIS 2017 test-dev set). The runtimes of our proposed KMN and KMN<inline-formula><tex-math notation="LaTeX">^{M}</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mi>M</mml:mi></mml:msup></mml:math><inline-graphic xlink:href="kim-ieq3-3163375.gif"/> </inline-formula> on DAVIS 2016 validation set are 0.12 and 0.13 seconds per frame, respectively, and the two networks have similar computation times to STM.]]></description><subject>Automobiles</subject><subject>Correlation</subject><subject>gaussian kernel</subject><subject>hide-and-seek</subject><subject>Image segmentation</subject><subject>Kernel</subject><subject>Kernels</subject><subject>Matching</subject><subject>memory network</subject><subject>Networks</subject><subject>Object segmentation</subject><subject>Queries</subject><subject>Segmentation</subject><subject>Target masking</subject><subject>Task analysis</subject><subject>Training</subject><subject>Video object segmentation</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE1PGzEQhi1URALtH6BStVIvXDa1PfZ6fYxQCwhCKjUpR8vZnU2d7kdq7wrBr8eQkENPc3ifdzTzEHLO6IQxqr8tfk5nNxNOOZ8AywCUPCJjpkGnIEF_IGPKMp7mOc9H5DSEDaVMSAonZBRzCZmWYzL_7Urskvlqg0Wf_MJ1g21ve9e1yTK4dp3com-xds9YJjNsOv-U3GP_2Pm_yYPr_ySzoe7dtsY9Fz6S48rWAT_t5xlZ_vi-uLxO7-ZXN5fTu7QAyfoUrBAlam21grKEQq9UyXT8QbOCS65YJbVgwopKg-KRKWwp6KqiFbMAeQ5n5GK3d-u7fwOG3jQuFFjXtsVuCIZnQuZS5TmN6Nf_0E03-DZeZ7jKGCjBJI8U31GF70LwWJmtd431T4ZR86rbvOk2r7rNXncsfdmvHlYNlofKu98IfN4BDhEPsVYixgpeAKxAgoQ</recordid><startdate>20230201</startdate><enddate>20230201</enddate><creator>Seong, Hongje</creator><creator>Hyun, Junhyuk</creator><creator>Kim, Euntai</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-0975-8390</orcidid><orcidid>https://orcid.org/0000-0001-7221-409X</orcidid></search><sort><creationdate>20230201</creationdate><title>Video Object Segmentation Using Kernelized Memory Network With Multiple Kernels</title><author>Seong, Hongje ; Hyun, Junhyuk ; Kim, Euntai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c351t-3a44de99a973dd3c9b7d1963391c25271f59414a4f937273dcad40bf0f1a33883</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Automobiles</topic><topic>Correlation</topic><topic>gaussian kernel</topic><topic>hide-and-seek</topic><topic>Image segmentation</topic><topic>Kernel</topic><topic>Kernels</topic><topic>Matching</topic><topic>memory network</topic><topic>Networks</topic><topic>Object segmentation</topic><topic>Queries</topic><topic>Segmentation</topic><topic>Target masking</topic><topic>Task analysis</topic><topic>Training</topic><topic>Video object segmentation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Seong, Hongje</creatorcontrib><creatorcontrib>Hyun, Junhyuk</creatorcontrib><creatorcontrib>Kim, Euntai</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Seong, Hongje</au><au>Hyun, Junhyuk</au><au>Kim, Euntai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Video Object Segmentation Using Kernelized Memory Network With Multiple Kernels</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2023-02-01</date><risdate>2023</risdate><volume>45</volume><issue>2</issue><spage>2595</spage><epage>2612</epage><pages>2595-2612</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract><![CDATA[Semi-supervised video object segmentation (VOS) is to predict the segment of a target object in a video when a ground truth segmentation mask for the target is given in the first frame. Recently, space-time memory networks (STM) have received significant attention as a promising approach for semi-supervised VOS. However, an important point has been overlooked in applying STM to VOS: The solution (=STM) is non-local, but the problem (=VOS) is predominantly local. To solve this mismatch between STM and VOS, we propose new VOS networks called kernelized memory network (KMN) and KMN with multiple kernels (KMN<inline-formula><tex-math notation="LaTeX">^{M}</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mi>M</mml:mi></mml:msup></mml:math><inline-graphic xlink:href="kim-ieq1-3163375.gif"/> </inline-formula>). Our networks conduct not only Query-to-Memory matching but also Memory-to-Query matching. In Memory-to-Query matching, a kernel is employed to reduce the degree of non-localness of the STM. In addition, we present a Hide-and-Seek strategy in pre-training to handle occlusions effectively. The proposed networks surpass the state-of-the-art results on standard benchmarks by a significant margin (+4% in <inline-formula><tex-math notation="LaTeX">\mathcal {J_{M}}</tex-math> <mml:math><mml:msub><mml:mi mathvariant="script">J</mml:mi><mml:mi mathvariant="script">M</mml:mi></mml:msub></mml:math><inline-graphic xlink:href="kim-ieq2-3163375.gif"/> </inline-formula> on DAVIS 2017 test-dev set). The runtimes of our proposed KMN and KMN<inline-formula><tex-math notation="LaTeX">^{M}</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mi>M</mml:mi></mml:msup></mml:math><inline-graphic xlink:href="kim-ieq3-3163375.gif"/> </inline-formula> on DAVIS 2016 validation set are 0.12 and 0.13 seconds per frame, respectively, and the two networks have similar computation times to STM.]]></abstract><cop>United States</cop><pub>IEEE</pub><pmid>35353695</pmid><doi>10.1109/TPAMI.2022.3163375</doi><tpages>18</tpages><orcidid>https://orcid.org/0000-0002-0975-8390</orcidid><orcidid>https://orcid.org/0000-0001-7221-409X</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0162-8828
ispartof	IEEE transactions on pattern analysis and machine intelligence, 2023-02, Vol.45 (2), p.2595-2612
issn	0162-8828 1939-3539 2160-9292
language	eng
recordid	cdi_pubmed_primary_35353695
source	IEEE Electronic Library (IEL)
subjects	Automobiles Correlation gaussian kernel hide-and-seek Image segmentation Kernel Kernels Matching memory network Networks Object segmentation Queries Segmentation Target masking Task analysis Training Video object segmentation
title	Video Object Segmentation Using Kernelized Memory Network With Multiple Kernels
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T04%3A55%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Video%20Object%20Segmentation%20Using%20Kernelized%20Memory%20Network%20With%20Multiple%20Kernels&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Seong,%20Hongje&rft.date=2023-02-01&rft.volume=45&rft.issue=2&rft.spage=2595&rft.epage=2612&rft.pages=2595-2612&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2022.3163375&rft_dat=%3Cproquest_RIE%3E2645857880%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2761374152&rft_id=info:pmid/35353695&rft_ieee_id=9745367&rfr_iscdi=true