Video Object Segmentation Using Kernelized Memory Network With Multiple Kernels

Semi-supervised video object segmentation (VOS) is to predict the segment of a target object in a video when a ground truth segmentation mask for the target is given in the first frame. Recently, space-time memory networks (STM) have received significant attention as a promising approach for semi-su...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence 2023-02, Vol.45 (2), p.2595-2612
Hauptverfasser: Seong, Hongje, Hyun, Junhyuk, Kim, Euntai
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2612
container_issue 2
container_start_page 2595
container_title IEEE transactions on pattern analysis and machine intelligence
container_volume 45
creator Seong, Hongje
Hyun, Junhyuk
Kim, Euntai
description Semi-supervised video object segmentation (VOS) is to predict the segment of a target object in a video when a ground truth segmentation mask for the target is given in the first frame. Recently, space-time memory networks (STM) have received significant attention as a promising approach for semi-supervised VOS. However, an important point has been overlooked in applying STM to VOS: The solution (=STM) is non-local, but the problem (=VOS) is predominantly local. To solve this mismatch between STM and VOS, we propose new VOS networks called kernelized memory network (KMN) and KMN with multiple kernels (KMN^{M} M ). Our networks conduct not only Query-to-Memory matching but also Memory-to-Query matching. In Memory-to-Query matching, a kernel is employed to reduce the degree of non-localness of the STM. In addition, we present a Hide-and-Seek strategy in pre-training to handle occlusions effectively. The proposed networks surpass the state-of-the-art results on standard benchmarks by a significant margin (+4% in \mathcal {J_{M}} JM on DAVIS 2017 test-dev set). The runtimes of our proposed KMN and KMN^{M} M on DAVIS 2016 validation set are 0.12 and 0.13 seconds per frame, respectively, and the two networks have similar computation times to STM.
doi_str_mv 10.1109/TPAMI.2022.3163375
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_pubmed_primary_35353695</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9745367</ieee_id><sourcerecordid>2645857880</sourcerecordid><originalsourceid>FETCH-LOGICAL-c351t-3a44de99a973dd3c9b7d1963391c25271f59414a4f937273dcad40bf0f1a33883</originalsourceid><addsrcrecordid>eNpdkE1PGzEQhi1URALtH6BStVIvXDa1PfZ6fYxQCwhCKjUpR8vZnU2d7kdq7wrBr8eQkENPc3ifdzTzEHLO6IQxqr8tfk5nNxNOOZ8AywCUPCJjpkGnIEF_IGPKMp7mOc9H5DSEDaVMSAonZBRzCZmWYzL_7Urskvlqg0Wf_MJ1g21ve9e1yTK4dp3com-xds9YJjNsOv-U3GP_2Pm_yYPr_ySzoe7dtsY9Fz6S48rWAT_t5xlZ_vi-uLxO7-ZXN5fTu7QAyfoUrBAlam21grKEQq9UyXT8QbOCS65YJbVgwopKg-KRKWwp6KqiFbMAeQ5n5GK3d-u7fwOG3jQuFFjXtsVuCIZnQuZS5TmN6Nf_0E03-DZeZ7jKGCjBJI8U31GF70LwWJmtd431T4ZR86rbvOk2r7rNXncsfdmvHlYNlofKu98IfN4BDhEPsVYixgpeAKxAgoQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2761374152</pqid></control><display><type>article</type><title>Video Object Segmentation Using Kernelized Memory Network With Multiple Kernels</title><source>IEEE Electronic Library (IEL)</source><creator>Seong, Hongje ; Hyun, Junhyuk ; Kim, Euntai</creator><creatorcontrib>Seong, Hongje ; Hyun, Junhyuk ; Kim, Euntai</creatorcontrib><description><![CDATA[Semi-supervised video object segmentation (VOS) is to predict the segment of a target object in a video when a ground truth segmentation mask for the target is given in the first frame. Recently, space-time memory networks (STM) have received significant attention as a promising approach for semi-supervised VOS. However, an important point has been overlooked in applying STM to VOS: The solution (=STM) is non-local, but the problem (=VOS) is predominantly local. To solve this mismatch between STM and VOS, we propose new VOS networks called kernelized memory network (KMN) and KMN with multiple kernels (KMN<inline-formula><tex-math notation="LaTeX">^{M}</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mi>M</mml:mi></mml:msup></mml:math><inline-graphic xlink:href="kim-ieq1-3163375.gif"/> </inline-formula>). Our networks conduct not only Query-to-Memory matching but also Memory-to-Query matching. In Memory-to-Query matching, a kernel is employed to reduce the degree of non-localness of the STM. In addition, we present a Hide-and-Seek strategy in pre-training to handle occlusions effectively. The proposed networks surpass the state-of-the-art results on standard benchmarks by a significant margin (+4% in <inline-formula><tex-math notation="LaTeX">\mathcal {J_{M}}</tex-math> <mml:math><mml:msub><mml:mi mathvariant="script">J</mml:mi><mml:mi mathvariant="script">M</mml:mi></mml:msub></mml:math><inline-graphic xlink:href="kim-ieq2-3163375.gif"/> </inline-formula> on DAVIS 2017 test-dev set). The runtimes of our proposed KMN and KMN<inline-formula><tex-math notation="LaTeX">^{M}</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mi>M</mml:mi></mml:msup></mml:math><inline-graphic xlink:href="kim-ieq3-3163375.gif"/> </inline-formula> on DAVIS 2016 validation set are 0.12 and 0.13 seconds per frame, respectively, and the two networks have similar computation times to STM.]]></description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2022.3163375</identifier><identifier>PMID: 35353695</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Automobiles ; Correlation ; gaussian kernel ; hide-and-seek ; Image segmentation ; Kernel ; Kernels ; Matching ; memory network ; Networks ; Object segmentation ; Queries ; Segmentation ; Target masking ; Task analysis ; Training ; Video object segmentation</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2023-02, Vol.45 (2), p.2595-2612</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c351t-3a44de99a973dd3c9b7d1963391c25271f59414a4f937273dcad40bf0f1a33883</citedby><cites>FETCH-LOGICAL-c351t-3a44de99a973dd3c9b7d1963391c25271f59414a4f937273dcad40bf0f1a33883</cites><orcidid>0000-0002-0975-8390 ; 0000-0001-7221-409X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9745367$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27915,27916,54749</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9745367$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35353695$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Seong, Hongje</creatorcontrib><creatorcontrib>Hyun, Junhyuk</creatorcontrib><creatorcontrib>Kim, Euntai</creatorcontrib><title>Video Object Segmentation Using Kernelized Memory Network With Multiple Kernels</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description><![CDATA[Semi-supervised video object segmentation (VOS) is to predict the segment of a target object in a video when a ground truth segmentation mask for the target is given in the first frame. Recently, space-time memory networks (STM) have received significant attention as a promising approach for semi-supervised VOS. However, an important point has been overlooked in applying STM to VOS: The solution (=STM) is non-local, but the problem (=VOS) is predominantly local. To solve this mismatch between STM and VOS, we propose new VOS networks called kernelized memory network (KMN) and KMN with multiple kernels (KMN<inline-formula><tex-math notation="LaTeX">^{M}</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mi>M</mml:mi></mml:msup></mml:math><inline-graphic xlink:href="kim-ieq1-3163375.gif"/> </inline-formula>). Our networks conduct not only Query-to-Memory matching but also Memory-to-Query matching. In Memory-to-Query matching, a kernel is employed to reduce the degree of non-localness of the STM. In addition, we present a Hide-and-Seek strategy in pre-training to handle occlusions effectively. The proposed networks surpass the state-of-the-art results on standard benchmarks by a significant margin (+4% in <inline-formula><tex-math notation="LaTeX">\mathcal {J_{M}}</tex-math> <mml:math><mml:msub><mml:mi mathvariant="script">J</mml:mi><mml:mi mathvariant="script">M</mml:mi></mml:msub></mml:math><inline-graphic xlink:href="kim-ieq2-3163375.gif"/> </inline-formula> on DAVIS 2017 test-dev set). The runtimes of our proposed KMN and KMN<inline-formula><tex-math notation="LaTeX">^{M}</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mi>M</mml:mi></mml:msup></mml:math><inline-graphic xlink:href="kim-ieq3-3163375.gif"/> </inline-formula> on DAVIS 2016 validation set are 0.12 and 0.13 seconds per frame, respectively, and the two networks have similar computation times to STM.]]></description><subject>Automobiles</subject><subject>Correlation</subject><subject>gaussian kernel</subject><subject>hide-and-seek</subject><subject>Image segmentation</subject><subject>Kernel</subject><subject>Kernels</subject><subject>Matching</subject><subject>memory network</subject><subject>Networks</subject><subject>Object segmentation</subject><subject>Queries</subject><subject>Segmentation</subject><subject>Target masking</subject><subject>Task analysis</subject><subject>Training</subject><subject>Video object segmentation</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE1PGzEQhi1URALtH6BStVIvXDa1PfZ6fYxQCwhCKjUpR8vZnU2d7kdq7wrBr8eQkENPc3ifdzTzEHLO6IQxqr8tfk5nNxNOOZ8AywCUPCJjpkGnIEF_IGPKMp7mOc9H5DSEDaVMSAonZBRzCZmWYzL_7Urskvlqg0Wf_MJ1g21ve9e1yTK4dp3com-xds9YJjNsOv-U3GP_2Pm_yYPr_ySzoe7dtsY9Fz6S48rWAT_t5xlZ_vi-uLxO7-ZXN5fTu7QAyfoUrBAlam21grKEQq9UyXT8QbOCS65YJbVgwopKg-KRKWwp6KqiFbMAeQ5n5GK3d-u7fwOG3jQuFFjXtsVuCIZnQuZS5TmN6Nf_0E03-DZeZ7jKGCjBJI8U31GF70LwWJmtd431T4ZR86rbvOk2r7rNXncsfdmvHlYNlofKu98IfN4BDhEPsVYixgpeAKxAgoQ</recordid><startdate>20230201</startdate><enddate>20230201</enddate><creator>Seong, Hongje</creator><creator>Hyun, Junhyuk</creator><creator>Kim, Euntai</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-0975-8390</orcidid><orcidid>https://orcid.org/0000-0001-7221-409X</orcidid></search><sort><creationdate>20230201</creationdate><title>Video Object Segmentation Using Kernelized Memory Network With Multiple Kernels</title><author>Seong, Hongje ; Hyun, Junhyuk ; Kim, Euntai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c351t-3a44de99a973dd3c9b7d1963391c25271f59414a4f937273dcad40bf0f1a33883</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Automobiles</topic><topic>Correlation</topic><topic>gaussian kernel</topic><topic>hide-and-seek</topic><topic>Image segmentation</topic><topic>Kernel</topic><topic>Kernels</topic><topic>Matching</topic><topic>memory network</topic><topic>Networks</topic><topic>Object segmentation</topic><topic>Queries</topic><topic>Segmentation</topic><topic>Target masking</topic><topic>Task analysis</topic><topic>Training</topic><topic>Video object segmentation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Seong, Hongje</creatorcontrib><creatorcontrib>Hyun, Junhyuk</creatorcontrib><creatorcontrib>Kim, Euntai</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Seong, Hongje</au><au>Hyun, Junhyuk</au><au>Kim, Euntai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Video Object Segmentation Using Kernelized Memory Network With Multiple Kernels</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2023-02-01</date><risdate>2023</risdate><volume>45</volume><issue>2</issue><spage>2595</spage><epage>2612</epage><pages>2595-2612</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract><![CDATA[Semi-supervised video object segmentation (VOS) is to predict the segment of a target object in a video when a ground truth segmentation mask for the target is given in the first frame. Recently, space-time memory networks (STM) have received significant attention as a promising approach for semi-supervised VOS. However, an important point has been overlooked in applying STM to VOS: The solution (=STM) is non-local, but the problem (=VOS) is predominantly local. To solve this mismatch between STM and VOS, we propose new VOS networks called kernelized memory network (KMN) and KMN with multiple kernels (KMN<inline-formula><tex-math notation="LaTeX">^{M}</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mi>M</mml:mi></mml:msup></mml:math><inline-graphic xlink:href="kim-ieq1-3163375.gif"/> </inline-formula>). Our networks conduct not only Query-to-Memory matching but also Memory-to-Query matching. In Memory-to-Query matching, a kernel is employed to reduce the degree of non-localness of the STM. In addition, we present a Hide-and-Seek strategy in pre-training to handle occlusions effectively. The proposed networks surpass the state-of-the-art results on standard benchmarks by a significant margin (+4% in <inline-formula><tex-math notation="LaTeX">\mathcal {J_{M}}</tex-math> <mml:math><mml:msub><mml:mi mathvariant="script">J</mml:mi><mml:mi mathvariant="script">M</mml:mi></mml:msub></mml:math><inline-graphic xlink:href="kim-ieq2-3163375.gif"/> </inline-formula> on DAVIS 2017 test-dev set). The runtimes of our proposed KMN and KMN<inline-formula><tex-math notation="LaTeX">^{M}</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mi>M</mml:mi></mml:msup></mml:math><inline-graphic xlink:href="kim-ieq3-3163375.gif"/> </inline-formula> on DAVIS 2016 validation set are 0.12 and 0.13 seconds per frame, respectively, and the two networks have similar computation times to STM.]]></abstract><cop>United States</cop><pub>IEEE</pub><pmid>35353695</pmid><doi>10.1109/TPAMI.2022.3163375</doi><tpages>18</tpages><orcidid>https://orcid.org/0000-0002-0975-8390</orcidid><orcidid>https://orcid.org/0000-0001-7221-409X</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0162-8828
ispartof IEEE transactions on pattern analysis and machine intelligence, 2023-02, Vol.45 (2), p.2595-2612
issn 0162-8828
1939-3539
2160-9292
language eng
recordid cdi_pubmed_primary_35353695
source IEEE Electronic Library (IEL)
subjects Automobiles
Correlation
gaussian kernel
hide-and-seek
Image segmentation
Kernel
Kernels
Matching
memory network
Networks
Object segmentation
Queries
Segmentation
Target masking
Task analysis
Training
Video object segmentation
title Video Object Segmentation Using Kernelized Memory Network With Multiple Kernels
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T04%3A55%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Video%20Object%20Segmentation%20Using%20Kernelized%20Memory%20Network%20With%20Multiple%20Kernels&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Seong,%20Hongje&rft.date=2023-02-01&rft.volume=45&rft.issue=2&rft.spage=2595&rft.epage=2612&rft.pages=2595-2612&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2022.3163375&rft_dat=%3Cproquest_RIE%3E2645857880%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2761374152&rft_id=info:pmid/35353695&rft_ieee_id=9745367&rfr_iscdi=true