Transductive Video Segmentation on Tree-Structured Model
This paper presents a transductive multicomponent video segmentation algorithm, which is capable of segmenting the predefined object of interest in the frames of a video sequence. To ensure temporal consistency, a temporal coherent parametric min-cut algorithm is developed to generate segmentation h...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on circuits and systems for video technology 2017-05, Vol.27 (5), p.992-1005 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1005 |
---|---|
container_issue | 5 |
container_start_page | 992 |
container_title | IEEE transactions on circuits and systems for video technology |
container_volume | 27 |
creator | Botao Wang Zhihui Fu Hongkai Xiong Zheng, Yuan F. |
description | This paper presents a transductive multicomponent video segmentation algorithm, which is capable of segmenting the predefined object of interest in the frames of a video sequence. To ensure temporal consistency, a temporal coherent parametric min-cut algorithm is developed to generate segmentation hypotheses based on visual cues and motion cues. Furthermore, each hypothesis is evaluated by an energy function from foreground resemblance, foreground/background divergence, boundary strength, and visual saliency. In particular, the state-of-the-art R-convolutional neural network descriptor is leveraged to encode the visual appearance of the foreground object. Finally, the optimal segmentation of the frame can be attained by assembling the segmentation hypotheses through the Monte Carlo approximation. In particular, multiple foreground components are built to capture the variances of the foreground object in shapes and poses. To group the frames into different components, a tree-structured graphical model named temporal tree is designed, where visually similar and temporally coherent frames are arranged in branches. The temporal tree can be constructed by iteratively adding frames to the active nodes by probabilistic clustering. In addition, each component, consisting of frames in the same branch, is characterized by a support vector machine classifier, which is learned in a transductive fashion by jointly maximizing the margin over the labeled frames and the unlabeled frames. As the frames from the same video sequence follow the same distribution, the transductive classifiers achieve stronger generalization capability than inductive ones. Experimental results on the public benchmarks demonstrate the effectiveness of the proposed method in comparison with other state-of-the-art supervised and unsupervised video segmentation methods. |
doi_str_mv | 10.1109/TCSVT.2016.2527378 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2174467547</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7401019</ieee_id><sourcerecordid>2174467547</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-139652fb8559443b570afc716d3ef74cb5f56466e9af205f15e8fa1460810cda3</originalsourceid><addsrcrecordid>eNo9kE1LAzEQhoMoWKt_QC8Lnrdmspkke5TiF1Q8dO01pLsT2dLu1mRX8N-bWhEGZg7vMzM8jF0DnwHw8q6aL1fVTHBQM4FCF9qcsAkgmlwIjqdp5gi5EYDn7CLGDecgjdQTZqrgutiM9dB-UbZqG-qzJX3sqBvc0PZdlqoKRPlyCCk0Bmqy176h7SU7824b6eqvT9n740M1f84Xb08v8_tFXosShxyKUqHwa4NYSlmsUXPnaw2qKchrWa_Ro5JKUel8-tQDkvEOpOIGeN24Yspuj3v3of8cKQ5204-hSyetAC2l0ih1Soljqg59jIG83Yd258K3BW4PhuyvIXswZP8MJejmCLVE9A9oyYFDWfwAsH9hKg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2174467547</pqid></control><display><type>article</type><title>Transductive Video Segmentation on Tree-Structured Model</title><source>IEEE Electronic Library (IEL)</source><creator>Botao Wang ; Zhihui Fu ; Hongkai Xiong ; Zheng, Yuan F.</creator><creatorcontrib>Botao Wang ; Zhihui Fu ; Hongkai Xiong ; Zheng, Yuan F.</creatorcontrib><description>This paper presents a transductive multicomponent video segmentation algorithm, which is capable of segmenting the predefined object of interest in the frames of a video sequence. To ensure temporal consistency, a temporal coherent parametric min-cut algorithm is developed to generate segmentation hypotheses based on visual cues and motion cues. Furthermore, each hypothesis is evaluated by an energy function from foreground resemblance, foreground/background divergence, boundary strength, and visual saliency. In particular, the state-of-the-art R-convolutional neural network descriptor is leveraged to encode the visual appearance of the foreground object. Finally, the optimal segmentation of the frame can be attained by assembling the segmentation hypotheses through the Monte Carlo approximation. In particular, multiple foreground components are built to capture the variances of the foreground object in shapes and poses. To group the frames into different components, a tree-structured graphical model named temporal tree is designed, where visually similar and temporally coherent frames are arranged in branches. The temporal tree can be constructed by iteratively adding frames to the active nodes by probabilistic clustering. In addition, each component, consisting of frames in the same branch, is characterized by a support vector machine classifier, which is learned in a transductive fashion by jointly maximizing the margin over the labeled frames and the unlabeled frames. As the frames from the same video sequence follow the same distribution, the transductive classifiers achieve stronger generalization capability than inductive ones. Experimental results on the public benchmarks demonstrate the effectiveness of the proposed method in comparison with other state-of-the-art supervised and unsupervised video segmentation methods.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2016.2527378</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Artificial neural networks ; Classifiers ; Clustering ; Computer simulation ; Divergence ; Frames ; Hypotheses ; Image segmentation ; Monte Carlo approximation ; Motion segmentation ; Object segmentation ; Optimization ; parametric min-cut ; Proposals ; Robustness ; Segmentation ; State of the art ; Support vector machines ; temporal tree ; transductive learning ; Video data ; video segmentation ; Video sequences ; Visualization</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2017-05, Vol.27 (5), p.992-1005</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2017</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-139652fb8559443b570afc716d3ef74cb5f56466e9af205f15e8fa1460810cda3</citedby><cites>FETCH-LOGICAL-c295t-139652fb8559443b570afc716d3ef74cb5f56466e9af205f15e8fa1460810cda3</cites><orcidid>0000-0003-4552-0029</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7401019$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,777,781,793,27905,27906,54739</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7401019$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Botao Wang</creatorcontrib><creatorcontrib>Zhihui Fu</creatorcontrib><creatorcontrib>Hongkai Xiong</creatorcontrib><creatorcontrib>Zheng, Yuan F.</creatorcontrib><title>Transductive Video Segmentation on Tree-Structured Model</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>This paper presents a transductive multicomponent video segmentation algorithm, which is capable of segmenting the predefined object of interest in the frames of a video sequence. To ensure temporal consistency, a temporal coherent parametric min-cut algorithm is developed to generate segmentation hypotheses based on visual cues and motion cues. Furthermore, each hypothesis is evaluated by an energy function from foreground resemblance, foreground/background divergence, boundary strength, and visual saliency. In particular, the state-of-the-art R-convolutional neural network descriptor is leveraged to encode the visual appearance of the foreground object. Finally, the optimal segmentation of the frame can be attained by assembling the segmentation hypotheses through the Monte Carlo approximation. In particular, multiple foreground components are built to capture the variances of the foreground object in shapes and poses. To group the frames into different components, a tree-structured graphical model named temporal tree is designed, where visually similar and temporally coherent frames are arranged in branches. The temporal tree can be constructed by iteratively adding frames to the active nodes by probabilistic clustering. In addition, each component, consisting of frames in the same branch, is characterized by a support vector machine classifier, which is learned in a transductive fashion by jointly maximizing the margin over the labeled frames and the unlabeled frames. As the frames from the same video sequence follow the same distribution, the transductive classifiers achieve stronger generalization capability than inductive ones. Experimental results on the public benchmarks demonstrate the effectiveness of the proposed method in comparison with other state-of-the-art supervised and unsupervised video segmentation methods.</description><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>Classifiers</subject><subject>Clustering</subject><subject>Computer simulation</subject><subject>Divergence</subject><subject>Frames</subject><subject>Hypotheses</subject><subject>Image segmentation</subject><subject>Monte Carlo approximation</subject><subject>Motion segmentation</subject><subject>Object segmentation</subject><subject>Optimization</subject><subject>parametric min-cut</subject><subject>Proposals</subject><subject>Robustness</subject><subject>Segmentation</subject><subject>State of the art</subject><subject>Support vector machines</subject><subject>temporal tree</subject><subject>transductive learning</subject><subject>Video data</subject><subject>video segmentation</subject><subject>Video sequences</subject><subject>Visualization</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1LAzEQhoMoWKt_QC8Lnrdmspkke5TiF1Q8dO01pLsT2dLu1mRX8N-bWhEGZg7vMzM8jF0DnwHw8q6aL1fVTHBQM4FCF9qcsAkgmlwIjqdp5gi5EYDn7CLGDecgjdQTZqrgutiM9dB-UbZqG-qzJX3sqBvc0PZdlqoKRPlyCCk0Bmqy176h7SU7824b6eqvT9n740M1f84Xb08v8_tFXosShxyKUqHwa4NYSlmsUXPnaw2qKchrWa_Ro5JKUel8-tQDkvEOpOIGeN24Yspuj3v3of8cKQ5204-hSyetAC2l0ih1Soljqg59jIG83Yd258K3BW4PhuyvIXswZP8MJejmCLVE9A9oyYFDWfwAsH9hKg</recordid><startdate>20170501</startdate><enddate>20170501</enddate><creator>Botao Wang</creator><creator>Zhihui Fu</creator><creator>Hongkai Xiong</creator><creator>Zheng, Yuan F.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-4552-0029</orcidid></search><sort><creationdate>20170501</creationdate><title>Transductive Video Segmentation on Tree-Structured Model</title><author>Botao Wang ; Zhihui Fu ; Hongkai Xiong ; Zheng, Yuan F.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-139652fb8559443b570afc716d3ef74cb5f56466e9af205f15e8fa1460810cda3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>Classifiers</topic><topic>Clustering</topic><topic>Computer simulation</topic><topic>Divergence</topic><topic>Frames</topic><topic>Hypotheses</topic><topic>Image segmentation</topic><topic>Monte Carlo approximation</topic><topic>Motion segmentation</topic><topic>Object segmentation</topic><topic>Optimization</topic><topic>parametric min-cut</topic><topic>Proposals</topic><topic>Robustness</topic><topic>Segmentation</topic><topic>State of the art</topic><topic>Support vector machines</topic><topic>temporal tree</topic><topic>transductive learning</topic><topic>Video data</topic><topic>video segmentation</topic><topic>Video sequences</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Botao Wang</creatorcontrib><creatorcontrib>Zhihui Fu</creatorcontrib><creatorcontrib>Hongkai Xiong</creatorcontrib><creatorcontrib>Zheng, Yuan F.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Botao Wang</au><au>Zhihui Fu</au><au>Hongkai Xiong</au><au>Zheng, Yuan F.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Transductive Video Segmentation on Tree-Structured Model</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2017-05-01</date><risdate>2017</risdate><volume>27</volume><issue>5</issue><spage>992</spage><epage>1005</epage><pages>992-1005</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>This paper presents a transductive multicomponent video segmentation algorithm, which is capable of segmenting the predefined object of interest in the frames of a video sequence. To ensure temporal consistency, a temporal coherent parametric min-cut algorithm is developed to generate segmentation hypotheses based on visual cues and motion cues. Furthermore, each hypothesis is evaluated by an energy function from foreground resemblance, foreground/background divergence, boundary strength, and visual saliency. In particular, the state-of-the-art R-convolutional neural network descriptor is leveraged to encode the visual appearance of the foreground object. Finally, the optimal segmentation of the frame can be attained by assembling the segmentation hypotheses through the Monte Carlo approximation. In particular, multiple foreground components are built to capture the variances of the foreground object in shapes and poses. To group the frames into different components, a tree-structured graphical model named temporal tree is designed, where visually similar and temporally coherent frames are arranged in branches. The temporal tree can be constructed by iteratively adding frames to the active nodes by probabilistic clustering. In addition, each component, consisting of frames in the same branch, is characterized by a support vector machine classifier, which is learned in a transductive fashion by jointly maximizing the margin over the labeled frames and the unlabeled frames. As the frames from the same video sequence follow the same distribution, the transductive classifiers achieve stronger generalization capability than inductive ones. Experimental results on the public benchmarks demonstrate the effectiveness of the proposed method in comparison with other state-of-the-art supervised and unsupervised video segmentation methods.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2016.2527378</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-4552-0029</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1051-8215 |
ispartof | IEEE transactions on circuits and systems for video technology, 2017-05, Vol.27 (5), p.992-1005 |
issn | 1051-8215 1558-2205 |
language | eng |
recordid | cdi_proquest_journals_2174467547 |
source | IEEE Electronic Library (IEL) |
subjects | Algorithms Artificial neural networks Classifiers Clustering Computer simulation Divergence Frames Hypotheses Image segmentation Monte Carlo approximation Motion segmentation Object segmentation Optimization parametric min-cut Proposals Robustness Segmentation State of the art Support vector machines temporal tree transductive learning Video data video segmentation Video sequences Visualization |
title | Transductive Video Segmentation on Tree-Structured Model |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T20%3A01%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Transductive%20Video%20Segmentation%20on%20Tree-Structured%20Model&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Botao%20Wang&rft.date=2017-05-01&rft.volume=27&rft.issue=5&rft.spage=992&rft.epage=1005&rft.pages=992-1005&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2016.2527378&rft_dat=%3Cproquest_RIE%3E2174467547%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2174467547&rft_id=info:pmid/&rft_ieee_id=7401019&rfr_iscdi=true |