Transductive Video Segmentation on Tree-Structured Model

This paper presents a transductive multicomponent video segmentation algorithm, which is capable of segmenting the predefined object of interest in the frames of a video sequence. To ensure temporal consistency, a temporal coherent parametric min-cut algorithm is developed to generate segmentation h...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems for video technology 2017-05, Vol.27 (5), p.992-1005
Hauptverfasser:	Botao Wang, Zhihui Fu, Hongkai Xiong, Zheng, Yuan F.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial neural networks Classifiers Clustering Computer simulation Divergence Frames Hypotheses Image segmentation Monte Carlo approximation Motion segmentation Object segmentation Optimization parametric min-cut Proposals Robustness Segmentation State of the art Support vector machines temporal tree transductive learning Video data video segmentation Video sequences Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1005
container_issue	5
container_start_page	992
container_title	IEEE transactions on circuits and systems for video technology
container_volume	27
creator	Botao Wang Zhihui Fu Hongkai Xiong Zheng, Yuan F.
description	This paper presents a transductive multicomponent video segmentation algorithm, which is capable of segmenting the predefined object of interest in the frames of a video sequence. To ensure temporal consistency, a temporal coherent parametric min-cut algorithm is developed to generate segmentation hypotheses based on visual cues and motion cues. Furthermore, each hypothesis is evaluated by an energy function from foreground resemblance, foreground/background divergence, boundary strength, and visual saliency. In particular, the state-of-the-art R-convolutional neural network descriptor is leveraged to encode the visual appearance of the foreground object. Finally, the optimal segmentation of the frame can be attained by assembling the segmentation hypotheses through the Monte Carlo approximation. In particular, multiple foreground components are built to capture the variances of the foreground object in shapes and poses. To group the frames into different components, a tree-structured graphical model named temporal tree is designed, where visually similar and temporally coherent frames are arranged in branches. The temporal tree can be constructed by iteratively adding frames to the active nodes by probabilistic clustering. In addition, each component, consisting of frames in the same branch, is characterized by a support vector machine classifier, which is learned in a transductive fashion by jointly maximizing the margin over the labeled frames and the unlabeled frames. As the frames from the same video sequence follow the same distribution, the transductive classifiers achieve stronger generalization capability than inductive ones. Experimental results on the public benchmarks demonstrate the effectiveness of the proposed method in comparison with other state-of-the-art supervised and unsupervised video segmentation methods.
doi_str_mv	10.1109/TCSVT.2016.2527378
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2174467547</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7401019</ieee_id><sourcerecordid>2174467547</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-139652fb8559443b570afc716d3ef74cb5f56466e9af205f15e8fa1460810cda3</originalsourceid><addsrcrecordid>eNo9kE1LAzEQhoMoWKt_QC8Lnrdmspkke5TiF1Q8dO01pLsT2dLu1mRX8N-bWhEGZg7vMzM8jF0DnwHw8q6aL1fVTHBQM4FCF9qcsAkgmlwIjqdp5gi5EYDn7CLGDecgjdQTZqrgutiM9dB-UbZqG-qzJX3sqBvc0PZdlqoKRPlyCCk0Bmqy176h7SU7824b6eqvT9n740M1f84Xb08v8_tFXosShxyKUqHwa4NYSlmsUXPnaw2qKchrWa_Ro5JKUel8-tQDkvEOpOIGeN24Yspuj3v3of8cKQ5204-hSyetAC2l0ih1Soljqg59jIG83Yd258K3BW4PhuyvIXswZP8MJejmCLVE9A9oyYFDWfwAsH9hKg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2174467547</pqid></control><display><type>article</type><title>Transductive Video Segmentation on Tree-Structured Model</title><source>IEEE Electronic Library (IEL)</source><creator>Botao Wang ; Zhihui Fu ; Hongkai Xiong ; Zheng, Yuan F.</creator><creatorcontrib>Botao Wang ; Zhihui Fu ; Hongkai Xiong ; Zheng, Yuan F.</creatorcontrib><description>This paper presents a transductive multicomponent video segmentation algorithm, which is capable of segmenting the predefined object of interest in the frames of a video sequence. To ensure temporal consistency, a temporal coherent parametric min-cut algorithm is developed to generate segmentation hypotheses based on visual cues and motion cues. Furthermore, each hypothesis is evaluated by an energy function from foreground resemblance, foreground/background divergence, boundary strength, and visual saliency. In particular, the state-of-the-art R-convolutional neural network descriptor is leveraged to encode the visual appearance of the foreground object. Finally, the optimal segmentation of the frame can be attained by assembling the segmentation hypotheses through the Monte Carlo approximation. In particular, multiple foreground components are built to capture the variances of the foreground object in shapes and poses. To group the frames into different components, a tree-structured graphical model named temporal tree is designed, where visually similar and temporally coherent frames are arranged in branches. The temporal tree can be constructed by iteratively adding frames to the active nodes by probabilistic clustering. In addition, each component, consisting of frames in the same branch, is characterized by a support vector machine classifier, which is learned in a transductive fashion by jointly maximizing the margin over the labeled frames and the unlabeled frames. As the frames from the same video sequence follow the same distribution, the transductive classifiers achieve stronger generalization capability than inductive ones. Experimental results on the public benchmarks demonstrate the effectiveness of the proposed method in comparison with other state-of-the-art supervised and unsupervised video segmentation methods.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2016.2527378</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Artificial neural networks ; Classifiers ; Clustering ; Computer simulation ; Divergence ; Frames ; Hypotheses ; Image segmentation ; Monte Carlo approximation ; Motion segmentation ; Object segmentation ; Optimization ; parametric min-cut ; Proposals ; Robustness ; Segmentation ; State of the art ; Support vector machines ; temporal tree ; transductive learning ; Video data ; video segmentation ; Video sequences ; Visualization</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2017-05, Vol.27 (5), p.992-1005</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2017</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-139652fb8559443b570afc716d3ef74cb5f56466e9af205f15e8fa1460810cda3</citedby><cites>FETCH-LOGICAL-c295t-139652fb8559443b570afc716d3ef74cb5f56466e9af205f15e8fa1460810cda3</cites><orcidid>0000-0003-4552-0029</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7401019$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,777,781,793,27905,27906,54739</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7401019$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Botao Wang</creatorcontrib><creatorcontrib>Zhihui Fu</creatorcontrib><creatorcontrib>Hongkai Xiong</creatorcontrib><creatorcontrib>Zheng, Yuan F.</creatorcontrib><title>Transductive Video Segmentation on Tree-Structured Model</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>This paper presents a transductive multicomponent video segmentation algorithm, which is capable of segmenting the predefined object of interest in the frames of a video sequence. To ensure temporal consistency, a temporal coherent parametric min-cut algorithm is developed to generate segmentation hypotheses based on visual cues and motion cues. Furthermore, each hypothesis is evaluated by an energy function from foreground resemblance, foreground/background divergence, boundary strength, and visual saliency. In particular, the state-of-the-art R-convolutional neural network descriptor is leveraged to encode the visual appearance of the foreground object. Finally, the optimal segmentation of the frame can be attained by assembling the segmentation hypotheses through the Monte Carlo approximation. In particular, multiple foreground components are built to capture the variances of the foreground object in shapes and poses. To group the frames into different components, a tree-structured graphical model named temporal tree is designed, where visually similar and temporally coherent frames are arranged in branches. The temporal tree can be constructed by iteratively adding frames to the active nodes by probabilistic clustering. In addition, each component, consisting of frames in the same branch, is characterized by a support vector machine classifier, which is learned in a transductive fashion by jointly maximizing the margin over the labeled frames and the unlabeled frames. As the frames from the same video sequence follow the same distribution, the transductive classifiers achieve stronger generalization capability than inductive ones. Experimental results on the public benchmarks demonstrate the effectiveness of the proposed method in comparison with other state-of-the-art supervised and unsupervised video segmentation methods.</description><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>Classifiers</subject><subject>Clustering</subject><subject>Computer simulation</subject><subject>Divergence</subject><subject>Frames</subject><subject>Hypotheses</subject><subject>Image segmentation</subject><subject>Monte Carlo approximation</subject><subject>Motion segmentation</subject><subject>Object segmentation</subject><subject>Optimization</subject><subject>parametric min-cut</subject><subject>Proposals</subject><subject>Robustness</subject><subject>Segmentation</subject><subject>State of the art</subject><subject>Support vector machines</subject><subject>temporal tree</subject><subject>transductive learning</subject><subject>Video data</subject><subject>video segmentation</subject><subject>Video sequences</subject><subject>Visualization</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1LAzEQhoMoWKt_QC8Lnrdmspkke5TiF1Q8dO01pLsT2dLu1mRX8N-bWhEGZg7vMzM8jF0DnwHw8q6aL1fVTHBQM4FCF9qcsAkgmlwIjqdp5gi5EYDn7CLGDecgjdQTZqrgutiM9dB-UbZqG-qzJX3sqBvc0PZdlqoKRPlyCCk0Bmqy176h7SU7824b6eqvT9n740M1f84Xb08v8_tFXosShxyKUqHwa4NYSlmsUXPnaw2qKchrWa_Ro5JKUel8-tQDkvEOpOIGeN24Yspuj3v3of8cKQ5204-hSyetAC2l0ih1Soljqg59jIG83Yd258K3BW4PhuyvIXswZP8MJejmCLVE9A9oyYFDWfwAsH9hKg</recordid><startdate>20170501</startdate><enddate>20170501</enddate><creator>Botao Wang</creator><creator>Zhihui Fu</creator><creator>Hongkai Xiong</creator><creator>Zheng, Yuan F.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-4552-0029</orcidid></search><sort><creationdate>20170501</creationdate><title>Transductive Video Segmentation on Tree-Structured Model</title><author>Botao Wang ; Zhihui Fu ; Hongkai Xiong ; Zheng, Yuan F.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-139652fb8559443b570afc716d3ef74cb5f56466e9af205f15e8fa1460810cda3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>Classifiers</topic><topic>Clustering</topic><topic>Computer simulation</topic><topic>Divergence</topic><topic>Frames</topic><topic>Hypotheses</topic><topic>Image segmentation</topic><topic>Monte Carlo approximation</topic><topic>Motion segmentation</topic><topic>Object segmentation</topic><topic>Optimization</topic><topic>parametric min-cut</topic><topic>Proposals</topic><topic>Robustness</topic><topic>Segmentation</topic><topic>State of the art</topic><topic>Support vector machines</topic><topic>temporal tree</topic><topic>transductive learning</topic><topic>Video data</topic><topic>video segmentation</topic><topic>Video sequences</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Botao Wang</creatorcontrib><creatorcontrib>Zhihui Fu</creatorcontrib><creatorcontrib>Hongkai Xiong</creatorcontrib><creatorcontrib>Zheng, Yuan F.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Botao Wang</au><au>Zhihui Fu</au><au>Hongkai Xiong</au><au>Zheng, Yuan F.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Transductive Video Segmentation on Tree-Structured Model</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2017-05-01</date><risdate>2017</risdate><volume>27</volume><issue>5</issue><spage>992</spage><epage>1005</epage><pages>992-1005</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>This paper presents a transductive multicomponent video segmentation algorithm, which is capable of segmenting the predefined object of interest in the frames of a video sequence. To ensure temporal consistency, a temporal coherent parametric min-cut algorithm is developed to generate segmentation hypotheses based on visual cues and motion cues. Furthermore, each hypothesis is evaluated by an energy function from foreground resemblance, foreground/background divergence, boundary strength, and visual saliency. In particular, the state-of-the-art R-convolutional neural network descriptor is leveraged to encode the visual appearance of the foreground object. Finally, the optimal segmentation of the frame can be attained by assembling the segmentation hypotheses through the Monte Carlo approximation. In particular, multiple foreground components are built to capture the variances of the foreground object in shapes and poses. To group the frames into different components, a tree-structured graphical model named temporal tree is designed, where visually similar and temporally coherent frames are arranged in branches. The temporal tree can be constructed by iteratively adding frames to the active nodes by probabilistic clustering. In addition, each component, consisting of frames in the same branch, is characterized by a support vector machine classifier, which is learned in a transductive fashion by jointly maximizing the margin over the labeled frames and the unlabeled frames. As the frames from the same video sequence follow the same distribution, the transductive classifiers achieve stronger generalization capability than inductive ones. Experimental results on the public benchmarks demonstrate the effectiveness of the proposed method in comparison with other state-of-the-art supervised and unsupervised video segmentation methods.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2016.2527378</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-4552-0029</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1051-8215
ispartof	IEEE transactions on circuits and systems for video technology, 2017-05, Vol.27 (5), p.992-1005
issn	1051-8215 1558-2205
language	eng
recordid	cdi_proquest_journals_2174467547
source	IEEE Electronic Library (IEL)
subjects	Algorithms Artificial neural networks Classifiers Clustering Computer simulation Divergence Frames Hypotheses Image segmentation Monte Carlo approximation Motion segmentation Object segmentation Optimization parametric min-cut Proposals Robustness Segmentation State of the art Support vector machines temporal tree transductive learning Video data video segmentation Video sequences Visualization
title	Transductive Video Segmentation on Tree-Structured Model
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T20%3A01%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Transductive%20Video%20Segmentation%20on%20Tree-Structured%20Model&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Botao%20Wang&rft.date=2017-05-01&rft.volume=27&rft.issue=5&rft.spage=992&rft.epage=1005&rft.pages=992-1005&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2016.2527378&rft_dat=%3Cproquest_RIE%3E2174467547%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2174467547&rft_id=info:pmid/&rft_ieee_id=7401019&rfr_iscdi=true