Video Object Segmentation Using Kernelized Memory Network With Multiple Kernels
Semi-supervised video object segmentation (VOS) is to predict the segment of a target object in a video when a ground truth segmentation mask for the target is given in the first frame. Recently, space-time memory networks (STM) have received significant attention as a promising approach for semi-su...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence 2023-02, Vol.45 (2), p.2595-2612 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2612 |
---|---|
container_issue | 2 |
container_start_page | 2595 |
container_title | IEEE transactions on pattern analysis and machine intelligence |
container_volume | 45 |
creator | Seong, Hongje Hyun, Junhyuk Kim, Euntai |
description | Semi-supervised video object segmentation (VOS) is to predict the segment of a target object in a video when a ground truth segmentation mask for the target is given in the first frame. Recently, space-time memory networks (STM) have received significant attention as a promising approach for semi-supervised VOS. However, an important point has been overlooked in applying STM to VOS: The solution (=STM) is non-local, but the problem (=VOS) is predominantly local. To solve this mismatch between STM and VOS, we propose new VOS networks called kernelized memory network (KMN) and KMN with multiple kernels (KMN^{M} M ). Our networks conduct not only Query-to-Memory matching but also Memory-to-Query matching. In Memory-to-Query matching, a kernel is employed to reduce the degree of non-localness of the STM. In addition, we present a Hide-and-Seek strategy in pre-training to handle occlusions effectively. The proposed networks surpass the state-of-the-art results on standard benchmarks by a significant margin (+4% in \mathcal {J_{M}} JM on DAVIS 2017 test-dev set). The runtimes of our proposed KMN and KMN^{M} M on DAVIS 2016 validation set are 0.12 and 0.13 seconds per frame, respectively, and the two networks have similar computation times to STM. |
doi_str_mv | 10.1109/TPAMI.2022.3163375 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_pubmed_primary_35353695</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9745367</ieee_id><sourcerecordid>2645857880</sourcerecordid><originalsourceid>FETCH-LOGICAL-c351t-3a44de99a973dd3c9b7d1963391c25271f59414a4f937273dcad40bf0f1a33883</originalsourceid><addsrcrecordid>eNpdkE1PGzEQhi1URALtH6BStVIvXDa1PfZ6fYxQCwhCKjUpR8vZnU2d7kdq7wrBr8eQkENPc3ifdzTzEHLO6IQxqr8tfk5nNxNOOZ8AywCUPCJjpkGnIEF_IGPKMp7mOc9H5DSEDaVMSAonZBRzCZmWYzL_7Urskvlqg0Wf_MJ1g21ve9e1yTK4dp3com-xds9YJjNsOv-U3GP_2Pm_yYPr_ySzoe7dtsY9Fz6S48rWAT_t5xlZ_vi-uLxO7-ZXN5fTu7QAyfoUrBAlam21grKEQq9UyXT8QbOCS65YJbVgwopKg-KRKWwp6KqiFbMAeQ5n5GK3d-u7fwOG3jQuFFjXtsVuCIZnQuZS5TmN6Nf_0E03-DZeZ7jKGCjBJI8U31GF70LwWJmtd431T4ZR86rbvOk2r7rNXncsfdmvHlYNlofKu98IfN4BDhEPsVYixgpeAKxAgoQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2761374152</pqid></control><display><type>article</type><title>Video Object Segmentation Using Kernelized Memory Network With Multiple Kernels</title><source>IEEE Electronic Library (IEL)</source><creator>Seong, Hongje ; Hyun, Junhyuk ; Kim, Euntai</creator><creatorcontrib>Seong, Hongje ; Hyun, Junhyuk ; Kim, Euntai</creatorcontrib><description><![CDATA[Semi-supervised video object segmentation (VOS) is to predict the segment of a target object in a video when a ground truth segmentation mask for the target is given in the first frame. Recently, space-time memory networks (STM) have received significant attention as a promising approach for semi-supervised VOS. However, an important point has been overlooked in applying STM to VOS: The solution (=STM) is non-local, but the problem (=VOS) is predominantly local. To solve this mismatch between STM and VOS, we propose new VOS networks called kernelized memory network (KMN) and KMN with multiple kernels (KMN<inline-formula><tex-math notation="LaTeX">^{M}</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mi>M</mml:mi></mml:msup></mml:math><inline-graphic xlink:href="kim-ieq1-3163375.gif"/> </inline-formula>). Our networks conduct not only Query-to-Memory matching but also Memory-to-Query matching. In Memory-to-Query matching, a kernel is employed to reduce the degree of non-localness of the STM. In addition, we present a Hide-and-Seek strategy in pre-training to handle occlusions effectively. The proposed networks surpass the state-of-the-art results on standard benchmarks by a significant margin (+4% in <inline-formula><tex-math notation="LaTeX">\mathcal {J_{M}}</tex-math> <mml:math><mml:msub><mml:mi mathvariant="script">J</mml:mi><mml:mi mathvariant="script">M</mml:mi></mml:msub></mml:math><inline-graphic xlink:href="kim-ieq2-3163375.gif"/> </inline-formula> on DAVIS 2017 test-dev set). The runtimes of our proposed KMN and KMN<inline-formula><tex-math notation="LaTeX">^{M}</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mi>M</mml:mi></mml:msup></mml:math><inline-graphic xlink:href="kim-ieq3-3163375.gif"/> </inline-formula> on DAVIS 2016 validation set are 0.12 and 0.13 seconds per frame, respectively, and the two networks have similar computation times to STM.]]></description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2022.3163375</identifier><identifier>PMID: 35353695</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Automobiles ; Correlation ; gaussian kernel ; hide-and-seek ; Image segmentation ; Kernel ; Kernels ; Matching ; memory network ; Networks ; Object segmentation ; Queries ; Segmentation ; Target masking ; Task analysis ; Training ; Video object segmentation</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2023-02, Vol.45 (2), p.2595-2612</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c351t-3a44de99a973dd3c9b7d1963391c25271f59414a4f937273dcad40bf0f1a33883</citedby><cites>FETCH-LOGICAL-c351t-3a44de99a973dd3c9b7d1963391c25271f59414a4f937273dcad40bf0f1a33883</cites><orcidid>0000-0002-0975-8390 ; 0000-0001-7221-409X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9745367$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27915,27916,54749</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9745367$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35353695$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Seong, Hongje</creatorcontrib><creatorcontrib>Hyun, Junhyuk</creatorcontrib><creatorcontrib>Kim, Euntai</creatorcontrib><title>Video Object Segmentation Using Kernelized Memory Network With Multiple Kernels</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description><![CDATA[Semi-supervised video object segmentation (VOS) is to predict the segment of a target object in a video when a ground truth segmentation mask for the target is given in the first frame. Recently, space-time memory networks (STM) have received significant attention as a promising approach for semi-supervised VOS. However, an important point has been overlooked in applying STM to VOS: The solution (=STM) is non-local, but the problem (=VOS) is predominantly local. To solve this mismatch between STM and VOS, we propose new VOS networks called kernelized memory network (KMN) and KMN with multiple kernels (KMN<inline-formula><tex-math notation="LaTeX">^{M}</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mi>M</mml:mi></mml:msup></mml:math><inline-graphic xlink:href="kim-ieq1-3163375.gif"/> </inline-formula>). Our networks conduct not only Query-to-Memory matching but also Memory-to-Query matching. In Memory-to-Query matching, a kernel is employed to reduce the degree of non-localness of the STM. In addition, we present a Hide-and-Seek strategy in pre-training to handle occlusions effectively. The proposed networks surpass the state-of-the-art results on standard benchmarks by a significant margin (+4% in <inline-formula><tex-math notation="LaTeX">\mathcal {J_{M}}</tex-math> <mml:math><mml:msub><mml:mi mathvariant="script">J</mml:mi><mml:mi mathvariant="script">M</mml:mi></mml:msub></mml:math><inline-graphic xlink:href="kim-ieq2-3163375.gif"/> </inline-formula> on DAVIS 2017 test-dev set). The runtimes of our proposed KMN and KMN<inline-formula><tex-math notation="LaTeX">^{M}</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mi>M</mml:mi></mml:msup></mml:math><inline-graphic xlink:href="kim-ieq3-3163375.gif"/> </inline-formula> on DAVIS 2016 validation set are 0.12 and 0.13 seconds per frame, respectively, and the two networks have similar computation times to STM.]]></description><subject>Automobiles</subject><subject>Correlation</subject><subject>gaussian kernel</subject><subject>hide-and-seek</subject><subject>Image segmentation</subject><subject>Kernel</subject><subject>Kernels</subject><subject>Matching</subject><subject>memory network</subject><subject>Networks</subject><subject>Object segmentation</subject><subject>Queries</subject><subject>Segmentation</subject><subject>Target masking</subject><subject>Task analysis</subject><subject>Training</subject><subject>Video object segmentation</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE1PGzEQhi1URALtH6BStVIvXDa1PfZ6fYxQCwhCKjUpR8vZnU2d7kdq7wrBr8eQkENPc3ifdzTzEHLO6IQxqr8tfk5nNxNOOZ8AywCUPCJjpkGnIEF_IGPKMp7mOc9H5DSEDaVMSAonZBRzCZmWYzL_7Urskvlqg0Wf_MJ1g21ve9e1yTK4dp3com-xds9YJjNsOv-U3GP_2Pm_yYPr_ySzoe7dtsY9Fz6S48rWAT_t5xlZ_vi-uLxO7-ZXN5fTu7QAyfoUrBAlam21grKEQq9UyXT8QbOCS65YJbVgwopKg-KRKWwp6KqiFbMAeQ5n5GK3d-u7fwOG3jQuFFjXtsVuCIZnQuZS5TmN6Nf_0E03-DZeZ7jKGCjBJI8U31GF70LwWJmtd431T4ZR86rbvOk2r7rNXncsfdmvHlYNlofKu98IfN4BDhEPsVYixgpeAKxAgoQ</recordid><startdate>20230201</startdate><enddate>20230201</enddate><creator>Seong, Hongje</creator><creator>Hyun, Junhyuk</creator><creator>Kim, Euntai</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-0975-8390</orcidid><orcidid>https://orcid.org/0000-0001-7221-409X</orcidid></search><sort><creationdate>20230201</creationdate><title>Video Object Segmentation Using Kernelized Memory Network With Multiple Kernels</title><author>Seong, Hongje ; Hyun, Junhyuk ; Kim, Euntai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c351t-3a44de99a973dd3c9b7d1963391c25271f59414a4f937273dcad40bf0f1a33883</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Automobiles</topic><topic>Correlation</topic><topic>gaussian kernel</topic><topic>hide-and-seek</topic><topic>Image segmentation</topic><topic>Kernel</topic><topic>Kernels</topic><topic>Matching</topic><topic>memory network</topic><topic>Networks</topic><topic>Object segmentation</topic><topic>Queries</topic><topic>Segmentation</topic><topic>Target masking</topic><topic>Task analysis</topic><topic>Training</topic><topic>Video object segmentation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Seong, Hongje</creatorcontrib><creatorcontrib>Hyun, Junhyuk</creatorcontrib><creatorcontrib>Kim, Euntai</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Seong, Hongje</au><au>Hyun, Junhyuk</au><au>Kim, Euntai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Video Object Segmentation Using Kernelized Memory Network With Multiple Kernels</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2023-02-01</date><risdate>2023</risdate><volume>45</volume><issue>2</issue><spage>2595</spage><epage>2612</epage><pages>2595-2612</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract><![CDATA[Semi-supervised video object segmentation (VOS) is to predict the segment of a target object in a video when a ground truth segmentation mask for the target is given in the first frame. Recently, space-time memory networks (STM) have received significant attention as a promising approach for semi-supervised VOS. However, an important point has been overlooked in applying STM to VOS: The solution (=STM) is non-local, but the problem (=VOS) is predominantly local. To solve this mismatch between STM and VOS, we propose new VOS networks called kernelized memory network (KMN) and KMN with multiple kernels (KMN<inline-formula><tex-math notation="LaTeX">^{M}</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mi>M</mml:mi></mml:msup></mml:math><inline-graphic xlink:href="kim-ieq1-3163375.gif"/> </inline-formula>). Our networks conduct not only Query-to-Memory matching but also Memory-to-Query matching. In Memory-to-Query matching, a kernel is employed to reduce the degree of non-localness of the STM. In addition, we present a Hide-and-Seek strategy in pre-training to handle occlusions effectively. The proposed networks surpass the state-of-the-art results on standard benchmarks by a significant margin (+4% in <inline-formula><tex-math notation="LaTeX">\mathcal {J_{M}}</tex-math> <mml:math><mml:msub><mml:mi mathvariant="script">J</mml:mi><mml:mi mathvariant="script">M</mml:mi></mml:msub></mml:math><inline-graphic xlink:href="kim-ieq2-3163375.gif"/> </inline-formula> on DAVIS 2017 test-dev set). The runtimes of our proposed KMN and KMN<inline-formula><tex-math notation="LaTeX">^{M}</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mi>M</mml:mi></mml:msup></mml:math><inline-graphic xlink:href="kim-ieq3-3163375.gif"/> </inline-formula> on DAVIS 2016 validation set are 0.12 and 0.13 seconds per frame, respectively, and the two networks have similar computation times to STM.]]></abstract><cop>United States</cop><pub>IEEE</pub><pmid>35353695</pmid><doi>10.1109/TPAMI.2022.3163375</doi><tpages>18</tpages><orcidid>https://orcid.org/0000-0002-0975-8390</orcidid><orcidid>https://orcid.org/0000-0001-7221-409X</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 0162-8828 |
ispartof | IEEE transactions on pattern analysis and machine intelligence, 2023-02, Vol.45 (2), p.2595-2612 |
issn | 0162-8828 1939-3539 2160-9292 |
language | eng |
recordid | cdi_pubmed_primary_35353695 |
source | IEEE Electronic Library (IEL) |
subjects | Automobiles Correlation gaussian kernel hide-and-seek Image segmentation Kernel Kernels Matching memory network Networks Object segmentation Queries Segmentation Target masking Task analysis Training Video object segmentation |
title | Video Object Segmentation Using Kernelized Memory Network With Multiple Kernels |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T04%3A55%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Video%20Object%20Segmentation%20Using%20Kernelized%20Memory%20Network%20With%20Multiple%20Kernels&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Seong,%20Hongje&rft.date=2023-02-01&rft.volume=45&rft.issue=2&rft.spage=2595&rft.epage=2612&rft.pages=2595-2612&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2022.3163375&rft_dat=%3Cproquest_RIE%3E2645857880%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2761374152&rft_id=info:pmid/35353695&rft_ieee_id=9745367&rfr_iscdi=true |