Bayesian Modeling of Temporal Coherence in Videos for Entity Discovery and Summarization

A video is understood by users in terms of entities present in it. Entity Discovery is the task of building appearance model for each entity (e.g., a person), and finding all its occurrences in the video. We represent a video as a sequence of tracklets, each spanning 10-20 frames, and associated wit...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence 2017-03, Vol.39 (3), p.430-443
Hauptverfasser:	Mitra, Adway, Biswas, Soma, Bhattacharyya, Chiranjib
Format:	Artikel
Sprache:	eng
Schlagworte:	Bayes methods Bayesian analysis Bayesian nonparametrics Chinese restaurant process Clustering Coherence Computational modeling entity discovery entity-driven video summarization Feature extraction Image segmentation temporal coherence temporal segmentation tracklet clustering Video data Videos YouTube
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	443
container_issue	3
container_start_page	430
container_title	IEEE transactions on pattern analysis and machine intelligence
container_volume	39
creator	Mitra, Adway Biswas, Soma Bhattacharyya, Chiranjib
description	A video is understood by users in terms of entities present in it. Entity Discovery is the task of building appearance model for each entity (e.g., a person), and finding all its occurrences in the video. We represent a video as a sequence of tracklets, each spanning 10-20 frames, and associated with one entity. We pose Entity Discovery as tracklet clustering, and approach it by leveraging Temporal Coherence (TC): the property that temporally neighboring tracklets are likely to be associated with the same entity. Our major contributions are the first Bayesian nonparametric models for TC at tracklet-level. We extend Chinese Restaurant Process (CRP) to TC-CRP, and further to Temporally Coherent Chinese Restaurant Franchise (TC-CRF) to jointly model entities and temporal segments using mixture components and sparse distributions. For discovering persons in TV serial videos without meta-data like scripts, these methods show considerable improvement over state-of-the-art approaches to tracklet clustering in terms of clustering accuracy, cluster purity and entity coverage. The proposed methods can perform online tracklet clustering on streaming videos unlike existing approaches, and can automatically reject false tracklets. Finally we discuss entity-driven video summarization- where temporal segments of the video are selected based on the discovered entities, to create a semantically meaningful summary.
doi_str_mv	10.1109/TPAMI.2016.2557785
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TPAMI_2016_2557785</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7457669</ieee_id><sourcerecordid>1826669188</sourcerecordid><originalsourceid>FETCH-LOGICAL-c351t-bc22fa6949305831531ec3475c6f28320a12199db882261a61a7db81799edc6e3</originalsourceid><addsrcrecordid>eNpdkN9rFDEQx4NY7LX6DyhIwJe-7JlJNr8e67VqoUXBU3xbcruzmrKbXJPdwvnXm3rXPgiBIcxnhu98CHkNbAnA7Pv11_ObqyVnoJZcSq2NfEYWYIWthBT2OVmUDq-M4eaYnOR8yxjUkokX5JhrAKWFWJCfH9wOs3eB3sQOBx9-0djTNY7bmNxAV_E3JgwtUh_oD99hzLSPiV6GyU87euFzG-8x7agLHf02j6NL_o-bfAwvyVHvhoyvDvWUfP94uV59rq6_fLpanV9XrZAwVZuW894pW1vBpBEgBWArai1b1XMjOHPAwdpuU87gClx5unxAW4tdq1CckrP93m2KdzPmqRlLKBwGFzDOuQHDlVIWjCnou__Q2zinUNIVSimhVW3rQvE91aaYc8K-2SZf7to1wJoH780_782D9-bgvQy9PayeNyN2TyOPogvwZg94RHxq61rqEk78Ba-khXo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1866376494</pqid></control><display><type>article</type><title>Bayesian Modeling of Temporal Coherence in Videos for Entity Discovery and Summarization</title><source>IEEE Electronic Library (IEL)</source><creator>Mitra, Adway ; Biswas, Soma ; Bhattacharyya, Chiranjib</creator><creatorcontrib>Mitra, Adway ; Biswas, Soma ; Bhattacharyya, Chiranjib</creatorcontrib><description>A video is understood by users in terms of entities present in it. Entity Discovery is the task of building appearance model for each entity (e.g., a person), and finding all its occurrences in the video. We represent a video as a sequence of tracklets, each spanning 10-20 frames, and associated with one entity. We pose Entity Discovery as tracklet clustering, and approach it by leveraging Temporal Coherence (TC): the property that temporally neighboring tracklets are likely to be associated with the same entity. Our major contributions are the first Bayesian nonparametric models for TC at tracklet-level. We extend Chinese Restaurant Process (CRP) to TC-CRP, and further to Temporally Coherent Chinese Restaurant Franchise (TC-CRF) to jointly model entities and temporal segments using mixture components and sparse distributions. For discovering persons in TV serial videos without meta-data like scripts, these methods show considerable improvement over state-of-the-art approaches to tracklet clustering in terms of clustering accuracy, cluster purity and entity coverage. The proposed methods can perform online tracklet clustering on streaming videos unlike existing approaches, and can automatically reject false tracklets. Finally we discuss entity-driven video summarization- where temporal segments of the video are selected based on the discovered entities, to create a semantically meaningful summary.</description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2016.2557785</identifier><identifier>PMID: 27116733</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Bayes methods ; Bayesian analysis ; Bayesian nonparametrics ; Chinese restaurant process ; Clustering ; Coherence ; Computational modeling ; entity discovery ; entity-driven video summarization ; Feature extraction ; Image segmentation ; temporal coherence ; temporal segmentation ; tracklet clustering ; Video data ; Videos ; YouTube</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2017-03, Vol.39 (3), p.430-443</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2017</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c351t-bc22fa6949305831531ec3475c6f28320a12199db882261a61a7db81799edc6e3</citedby><cites>FETCH-LOGICAL-c351t-bc22fa6949305831531ec3475c6f28320a12199db882261a61a7db81799edc6e3</cites><orcidid>0000-0001-6195-1844</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7457669$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54736</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7457669$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/27116733$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Mitra, Adway</creatorcontrib><creatorcontrib>Biswas, Soma</creatorcontrib><creatorcontrib>Bhattacharyya, Chiranjib</creatorcontrib><title>Bayesian Modeling of Temporal Coherence in Videos for Entity Discovery and Summarization</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description>A video is understood by users in terms of entities present in it. Entity Discovery is the task of building appearance model for each entity (e.g., a person), and finding all its occurrences in the video. We represent a video as a sequence of tracklets, each spanning 10-20 frames, and associated with one entity. We pose Entity Discovery as tracklet clustering, and approach it by leveraging Temporal Coherence (TC): the property that temporally neighboring tracklets are likely to be associated with the same entity. Our major contributions are the first Bayesian nonparametric models for TC at tracklet-level. We extend Chinese Restaurant Process (CRP) to TC-CRP, and further to Temporally Coherent Chinese Restaurant Franchise (TC-CRF) to jointly model entities and temporal segments using mixture components and sparse distributions. For discovering persons in TV serial videos without meta-data like scripts, these methods show considerable improvement over state-of-the-art approaches to tracklet clustering in terms of clustering accuracy, cluster purity and entity coverage. The proposed methods can perform online tracklet clustering on streaming videos unlike existing approaches, and can automatically reject false tracklets. Finally we discuss entity-driven video summarization- where temporal segments of the video are selected based on the discovered entities, to create a semantically meaningful summary.</description><subject>Bayes methods</subject><subject>Bayesian analysis</subject><subject>Bayesian nonparametrics</subject><subject>Chinese restaurant process</subject><subject>Clustering</subject><subject>Coherence</subject><subject>Computational modeling</subject><subject>entity discovery</subject><subject>entity-driven video summarization</subject><subject>Feature extraction</subject><subject>Image segmentation</subject><subject>temporal coherence</subject><subject>temporal segmentation</subject><subject>tracklet clustering</subject><subject>Video data</subject><subject>Videos</subject><subject>YouTube</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkN9rFDEQx4NY7LX6DyhIwJe-7JlJNr8e67VqoUXBU3xbcruzmrKbXJPdwvnXm3rXPgiBIcxnhu98CHkNbAnA7Pv11_ObqyVnoJZcSq2NfEYWYIWthBT2OVmUDq-M4eaYnOR8yxjUkokX5JhrAKWFWJCfH9wOs3eB3sQOBx9-0djTNY7bmNxAV_E3JgwtUh_oD99hzLSPiV6GyU87euFzG-8x7agLHf02j6NL_o-bfAwvyVHvhoyvDvWUfP94uV59rq6_fLpanV9XrZAwVZuW894pW1vBpBEgBWArai1b1XMjOHPAwdpuU87gClx5unxAW4tdq1CckrP93m2KdzPmqRlLKBwGFzDOuQHDlVIWjCnou__Q2zinUNIVSimhVW3rQvE91aaYc8K-2SZf7to1wJoH780_782D9-bgvQy9PayeNyN2TyOPogvwZg94RHxq61rqEk78Ba-khXo</recordid><startdate>20170301</startdate><enddate>20170301</enddate><creator>Mitra, Adway</creator><creator>Biswas, Soma</creator><creator>Bhattacharyya, Chiranjib</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-6195-1844</orcidid></search><sort><creationdate>20170301</creationdate><title>Bayesian Modeling of Temporal Coherence in Videos for Entity Discovery and Summarization</title><author>Mitra, Adway ; Biswas, Soma ; Bhattacharyya, Chiranjib</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c351t-bc22fa6949305831531ec3475c6f28320a12199db882261a61a7db81799edc6e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Bayes methods</topic><topic>Bayesian analysis</topic><topic>Bayesian nonparametrics</topic><topic>Chinese restaurant process</topic><topic>Clustering</topic><topic>Coherence</topic><topic>Computational modeling</topic><topic>entity discovery</topic><topic>entity-driven video summarization</topic><topic>Feature extraction</topic><topic>Image segmentation</topic><topic>temporal coherence</topic><topic>temporal segmentation</topic><topic>tracklet clustering</topic><topic>Video data</topic><topic>Videos</topic><topic>YouTube</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mitra, Adway</creatorcontrib><creatorcontrib>Biswas, Soma</creatorcontrib><creatorcontrib>Bhattacharyya, Chiranjib</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Mitra, Adway</au><au>Biswas, Soma</au><au>Bhattacharyya, Chiranjib</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Bayesian Modeling of Temporal Coherence in Videos for Entity Discovery and Summarization</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2017-03-01</date><risdate>2017</risdate><volume>39</volume><issue>3</issue><spage>430</spage><epage>443</epage><pages>430-443</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract>A video is understood by users in terms of entities present in it. Entity Discovery is the task of building appearance model for each entity (e.g., a person), and finding all its occurrences in the video. We represent a video as a sequence of tracklets, each spanning 10-20 frames, and associated with one entity. We pose Entity Discovery as tracklet clustering, and approach it by leveraging Temporal Coherence (TC): the property that temporally neighboring tracklets are likely to be associated with the same entity. Our major contributions are the first Bayesian nonparametric models for TC at tracklet-level. We extend Chinese Restaurant Process (CRP) to TC-CRP, and further to Temporally Coherent Chinese Restaurant Franchise (TC-CRF) to jointly model entities and temporal segments using mixture components and sparse distributions. For discovering persons in TV serial videos without meta-data like scripts, these methods show considerable improvement over state-of-the-art approaches to tracklet clustering in terms of clustering accuracy, cluster purity and entity coverage. The proposed methods can perform online tracklet clustering on streaming videos unlike existing approaches, and can automatically reject false tracklets. Finally we discuss entity-driven video summarization- where temporal segments of the video are selected based on the discovered entities, to create a semantically meaningful summary.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>27116733</pmid><doi>10.1109/TPAMI.2016.2557785</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-6195-1844</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0162-8828
ispartof	IEEE transactions on pattern analysis and machine intelligence, 2017-03, Vol.39 (3), p.430-443
issn	0162-8828 1939-3539 2160-9292
language	eng
recordid	cdi_crossref_primary_10_1109_TPAMI_2016_2557785
source	IEEE Electronic Library (IEL)
subjects	Bayes methods Bayesian analysis Bayesian nonparametrics Chinese restaurant process Clustering Coherence Computational modeling entity discovery entity-driven video summarization Feature extraction Image segmentation temporal coherence temporal segmentation tracklet clustering Video data Videos YouTube
title	Bayesian Modeling of Temporal Coherence in Videos for Entity Discovery and Summarization
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T15%3A39%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Bayesian%20Modeling%20of%20Temporal%20Coherence%20in%20Videos%20for%20Entity%20Discovery%20and%20Summarization&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Mitra,%20Adway&rft.date=2017-03-01&rft.volume=39&rft.issue=3&rft.spage=430&rft.epage=443&rft.pages=430-443&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2016.2557785&rft_dat=%3Cproquest_RIE%3E1826669188%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1866376494&rft_id=info:pmid/27116733&rft_ieee_id=7457669&rfr_iscdi=true