Hierarchical Context Modeling for Video Event Recognition

Current video event recognition research remains largely target-centered. For real-world surveillance videos, target-centered event recognition faces great challenges due to large intra-class target variation, limited image resolution, and poor detection and tracking results. To mitigate these chall...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence 2017-09, Vol.39 (9), p.1770-1782
Hauptverfasser:	Wang, Xiaoyang, Ji, Qiang
Format:	Artikel
Sprache:	eng
Schlagworte:	Context Context modeling event recognition Hidden Markov models Hierarchical context model image context Image detection Image recognition Image resolution Object recognition Priming priming context Representations semantic context Semantics Surveillance Target recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1782
container_issue	9
container_start_page	1770
container_title	IEEE transactions on pattern analysis and machine intelligence
container_volume	39
creator	Wang, Xiaoyang Ji, Qiang
description	Current video event recognition research remains largely target-centered. For real-world surveillance videos, target-centered event recognition faces great challenges due to large intra-class target variation, limited image resolution, and poor detection and tracking results. To mitigate these challenges, we introduced a context-augmented video event recognition approach. Specifically, we explicitly capture different types of contexts from three levels including image level, semantic level, and prior level. At the image level, we introduce two types of contextual features including the appearance context features and interaction context features to capture the appearance of context objects and their interactions with the target objects. At the semantic level, we propose a deep model based on deep Boltzmann machine to learn event object representations and their interactions. At the prior level, we utilize two types of prior-level contexts including scene priming and dynamic cueing. Finally, we introduce a hierarchical context model that systematically integrates the contextual information at different levels. Through the hierarchical context model, contexts at different levels jointly contribute to the event recognition. We evaluate the hierarchical context model for event recognition on benchmark surveillance video datasets. Results show that incorporating contexts in each level can improve event recognition performance, and jointly integrating three levels of contexts through our hierarchical model achieves the best performance.
doi_str_mv	10.1109/TPAMI.2016.2616308
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_pubmed_primary_28113742</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7588132</ieee_id><sourcerecordid>1861613030</sourcerecordid><originalsourceid>FETCH-LOGICAL-c351t-30bc340e33f73b5610bf78c7ed3d241c766de26f857beed8f7b8f9c9794f265f3</originalsourceid><addsrcrecordid>eNpdkE1P3DAQhq2qqCzb_oFWqiL1wiVbjyf-OqIVBSQQqKK9Wokz3hplY-pkEfx7suzCAc1hDvO8o5mHsa_AFwDc_ry9Obm6WAgOaiEUKOTmA5uBRVuiRPuRzaaJKI0R5pAdDcMd51BJjp_YoTAAqCsxY_Y8Uq6z_xd93RXL1I_0OBZXqaUu9qsipFz8jS2l4vSB-rH4TT6t-jjG1H9mB6HuBvqy73P259fp7fK8vLw-u1ieXJYeJYwl8sZjxQkxaGykAt4EbbymFltRgddKtSRUMFI3RK0JujHBeqttFYSSAefseLf3Pqf_GxpGt46Dp66re0qbwYGZfgfkU83Zj3foXdrkfrrOgRUaQUqlJkrsKJ_TMGQK7j7HdZ2fHHC3FetexLqtWLcXO4W-71dvmjW1b5FXkxPwbQdEInoba2kMoMBnFZB64A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1927315566</pqid></control><display><type>article</type><title>Hierarchical Context Modeling for Video Event Recognition</title><source>IEEE Electronic Library (IEL)</source><creator>Wang, Xiaoyang ; Ji, Qiang</creator><creatorcontrib>Wang, Xiaoyang ; Ji, Qiang</creatorcontrib><description>Current video event recognition research remains largely target-centered. For real-world surveillance videos, target-centered event recognition faces great challenges due to large intra-class target variation, limited image resolution, and poor detection and tracking results. To mitigate these challenges, we introduced a context-augmented video event recognition approach. Specifically, we explicitly capture different types of contexts from three levels including image level, semantic level, and prior level. At the image level, we introduce two types of contextual features including the appearance context features and interaction context features to capture the appearance of context objects and their interactions with the target objects. At the semantic level, we propose a deep model based on deep Boltzmann machine to learn event object representations and their interactions. At the prior level, we utilize two types of prior-level contexts including scene priming and dynamic cueing. Finally, we introduce a hierarchical context model that systematically integrates the contextual information at different levels. Through the hierarchical context model, contexts at different levels jointly contribute to the event recognition. We evaluate the hierarchical context model for event recognition on benchmark surveillance video datasets. Results show that incorporating contexts in each level can improve event recognition performance, and jointly integrating three levels of contexts through our hierarchical model achieves the best performance.</description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2016.2616308</identifier><identifier>PMID: 28113742</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Context ; Context modeling ; event recognition ; Hidden Markov models ; Hierarchical context model ; image context ; Image detection ; Image recognition ; Image resolution ; Object recognition ; Priming ; priming context ; Representations ; semantic context ; Semantics ; Surveillance ; Target recognition</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2017-09, Vol.39 (9), p.1770-1782</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2017</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c351t-30bc340e33f73b5610bf78c7ed3d241c766de26f857beed8f7b8f9c9794f265f3</citedby><cites>FETCH-LOGICAL-c351t-30bc340e33f73b5610bf78c7ed3d241c766de26f857beed8f7b8f9c9794f265f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7588132$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,782,786,798,27931,27932,54765</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7588132$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28113742$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Wang, Xiaoyang</creatorcontrib><creatorcontrib>Ji, Qiang</creatorcontrib><title>Hierarchical Context Modeling for Video Event Recognition</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><description>Current video event recognition research remains largely target-centered. For real-world surveillance videos, target-centered event recognition faces great challenges due to large intra-class target variation, limited image resolution, and poor detection and tracking results. To mitigate these challenges, we introduced a context-augmented video event recognition approach. Specifically, we explicitly capture different types of contexts from three levels including image level, semantic level, and prior level. At the image level, we introduce two types of contextual features including the appearance context features and interaction context features to capture the appearance of context objects and their interactions with the target objects. At the semantic level, we propose a deep model based on deep Boltzmann machine to learn event object representations and their interactions. At the prior level, we utilize two types of prior-level contexts including scene priming and dynamic cueing. Finally, we introduce a hierarchical context model that systematically integrates the contextual information at different levels. Through the hierarchical context model, contexts at different levels jointly contribute to the event recognition. We evaluate the hierarchical context model for event recognition on benchmark surveillance video datasets. Results show that incorporating contexts in each level can improve event recognition performance, and jointly integrating three levels of contexts through our hierarchical model achieves the best performance.</description><subject>Context</subject><subject>Context modeling</subject><subject>event recognition</subject><subject>Hidden Markov models</subject><subject>Hierarchical context model</subject><subject>image context</subject><subject>Image detection</subject><subject>Image recognition</subject><subject>Image resolution</subject><subject>Object recognition</subject><subject>Priming</subject><subject>priming context</subject><subject>Representations</subject><subject>semantic context</subject><subject>Semantics</subject><subject>Surveillance</subject><subject>Target recognition</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE1P3DAQhq2qqCzb_oFWqiL1wiVbjyf-OqIVBSQQqKK9Wokz3hplY-pkEfx7suzCAc1hDvO8o5mHsa_AFwDc_ry9Obm6WAgOaiEUKOTmA5uBRVuiRPuRzaaJKI0R5pAdDcMd51BJjp_YoTAAqCsxY_Y8Uq6z_xd93RXL1I_0OBZXqaUu9qsipFz8jS2l4vSB-rH4TT6t-jjG1H9mB6HuBvqy73P259fp7fK8vLw-u1ieXJYeJYwl8sZjxQkxaGykAt4EbbymFltRgddKtSRUMFI3RK0JujHBeqttFYSSAefseLf3Pqf_GxpGt46Dp66re0qbwYGZfgfkU83Zj3foXdrkfrrOgRUaQUqlJkrsKJ_TMGQK7j7HdZ2fHHC3FetexLqtWLcXO4W-71dvmjW1b5FXkxPwbQdEInoba2kMoMBnFZB64A</recordid><startdate>20170901</startdate><enddate>20170901</enddate><creator>Wang, Xiaoyang</creator><creator>Ji, Qiang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope></search><sort><creationdate>20170901</creationdate><title>Hierarchical Context Modeling for Video Event Recognition</title><author>Wang, Xiaoyang ; Ji, Qiang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c351t-30bc340e33f73b5610bf78c7ed3d241c766de26f857beed8f7b8f9c9794f265f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Context</topic><topic>Context modeling</topic><topic>event recognition</topic><topic>Hidden Markov models</topic><topic>Hierarchical context model</topic><topic>image context</topic><topic>Image detection</topic><topic>Image recognition</topic><topic>Image resolution</topic><topic>Object recognition</topic><topic>Priming</topic><topic>priming context</topic><topic>Representations</topic><topic>semantic context</topic><topic>Semantics</topic><topic>Surveillance</topic><topic>Target recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Xiaoyang</creatorcontrib><creatorcontrib>Ji, Qiang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Xiaoyang</au><au>Ji, Qiang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Hierarchical Context Modeling for Video Event Recognition</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><addtitle>IEEE Trans Pattern Anal Mach Intell</addtitle><date>2017-09-01</date><risdate>2017</risdate><volume>39</volume><issue>9</issue><spage>1770</spage><epage>1782</epage><pages>1770-1782</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract>Current video event recognition research remains largely target-centered. For real-world surveillance videos, target-centered event recognition faces great challenges due to large intra-class target variation, limited image resolution, and poor detection and tracking results. To mitigate these challenges, we introduced a context-augmented video event recognition approach. Specifically, we explicitly capture different types of contexts from three levels including image level, semantic level, and prior level. At the image level, we introduce two types of contextual features including the appearance context features and interaction context features to capture the appearance of context objects and their interactions with the target objects. At the semantic level, we propose a deep model based on deep Boltzmann machine to learn event object representations and their interactions. At the prior level, we utilize two types of prior-level contexts including scene priming and dynamic cueing. Finally, we introduce a hierarchical context model that systematically integrates the contextual information at different levels. Through the hierarchical context model, contexts at different levels jointly contribute to the event recognition. We evaluate the hierarchical context model for event recognition on benchmark surveillance video datasets. Results show that incorporating contexts in each level can improve event recognition performance, and jointly integrating three levels of contexts through our hierarchical model achieves the best performance.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>28113742</pmid><doi>10.1109/TPAMI.2016.2616308</doi><tpages>13</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0162-8828
ispartof	IEEE transactions on pattern analysis and machine intelligence, 2017-09, Vol.39 (9), p.1770-1782
issn	0162-8828 1939-3539 2160-9292
language	eng
recordid	cdi_pubmed_primary_28113742
source	IEEE Electronic Library (IEL)
subjects	Context Context modeling event recognition Hidden Markov models Hierarchical context model image context Image detection Image recognition Image resolution Object recognition Priming priming context Representations semantic context Semantics Surveillance Target recognition
title	Hierarchical Context Modeling for Video Event Recognition
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-05T02%3A37%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Hierarchical%20Context%20Modeling%20for%20Video%20Event%20Recognition&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Wang,%20Xiaoyang&rft.date=2017-09-01&rft.volume=39&rft.issue=9&rft.spage=1770&rft.epage=1782&rft.pages=1770-1782&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2016.2616308&rft_dat=%3Cproquest_RIE%3E1861613030%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1927315566&rft_id=info:pmid/28113742&rft_ieee_id=7588132&rfr_iscdi=true