A chains model for localizing participants of group activities in videos

Given a video, we would like to recognize group activities, localize video parts where these activities occur, and detect actors involved in them. This advances prior work that typically focuses only on video classification. We make a number of contributions. First, we specify a new, mid-level, vide...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Amer, M. R., Todorovic, S.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 793
container_issue
container_start_page 786
container_title
container_volume
creator Amer, M. R.
Todorovic, S.
description Given a video, we would like to recognize group activities, localize video parts where these activities occur, and detect actors involved in them. This advances prior work that typically focuses only on video classification. We make a number of contributions. First, we specify a new, mid-level, video feature aimed at summarizing local visual cues into bags of the right detections (BORDs). BORDs seek to identify the right people who participate in a target group activity among many noisy people detections. Second, we formulate a new, generative, chains model of group activities. Inference of the chains model identifies a subset of BORDs in the video that belong to occurrences of the activity, and organizes them in an ensemble of temporal chains. The chains extend over, and thus localize, the time intervals occupied by the activity. We formulate a new MAP inference algorithm that iterates two steps: i) Warps the chains of BORDs in space and time to their expected locations, so the transformed BORDs can better summarize local visual cues; and ii) Maximizes the posterior probability of the chains. We outperform the state of the art on benchmark UT-Human Interaction and Collective Activities datasets, under reasonable running times.
doi_str_mv 10.1109/ICCV.2011.6126317
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6126317</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6126317</ieee_id><sourcerecordid>6126317</sourcerecordid><originalsourceid>FETCH-LOGICAL-i218t-1c9a3f32236e248aeff68ec33650fb1f38542189daf61d87f0edc25e703642c43</originalsourceid><addsrcrecordid>eNo1kMtOwzAUBc1LIi39AMTGP5Dg67eXVQS0UiU2wLYyjl0uSpMoDpXg66lEWc1iRmdxCLkFVgEwd7-u67eKM4BKA9cCzBmZgVTGHC2Dc1JwYVlpFJMXZOGM_XegLkkBSrFSSeeuySznT8aE41YXZLWk4cNjl-m-b2JLUz_Stg--xR_sdnTw44QBB99NmfaJ7sb-a6A-THjACWOm2NEDNrHPN-Qq-TbHxYlz8vr48FKvys3z07pebkrkYKcSgvMiCc6FjlxaH1PSNgYhtGLpHZKwSh5D1_ikobEmsdgErqJhQksepJiTu79djDFuhxH3fvzenh4RvxwQUEc</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A chains model for localizing participants of group activities in videos</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Amer, M. R. ; Todorovic, S.</creator><creatorcontrib>Amer, M. R. ; Todorovic, S.</creatorcontrib><description>Given a video, we would like to recognize group activities, localize video parts where these activities occur, and detect actors involved in them. This advances prior work that typically focuses only on video classification. We make a number of contributions. First, we specify a new, mid-level, video feature aimed at summarizing local visual cues into bags of the right detections (BORDs). BORDs seek to identify the right people who participate in a target group activity among many noisy people detections. Second, we formulate a new, generative, chains model of group activities. Inference of the chains model identifies a subset of BORDs in the video that belong to occurrences of the activity, and organizes them in an ensemble of temporal chains. The chains extend over, and thus localize, the time intervals occupied by the activity. We formulate a new MAP inference algorithm that iterates two steps: i) Warps the chains of BORDs in space and time to their expected locations, so the transformed BORDs can better summarize local visual cues; and ii) Maximizes the posterior probability of the chains. We outperform the state of the art on benchmark UT-Human Interaction and Collective Activities datasets, under reasonable running times.</description><identifier>ISSN: 1550-5499</identifier><identifier>ISBN: 9781457711015</identifier><identifier>ISBN: 145771101X</identifier><identifier>EISSN: 2380-7504</identifier><identifier>EISBN: 1457711001</identifier><identifier>EISBN: 1457711028</identifier><identifier>EISBN: 9781457711022</identifier><identifier>EISBN: 9781457711008</identifier><identifier>DOI: 10.1109/ICCV.2011.6126317</identifier><language>eng</language><publisher>IEEE</publisher><subject>Feature extraction ; Histograms ; Humans ; Layout ; Spatiotemporal phenomena ; Videos ; Visualization</subject><ispartof>2011 International Conference on Computer Vision, 2011, p.786-793</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6126317$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6126317$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Amer, M. R.</creatorcontrib><creatorcontrib>Todorovic, S.</creatorcontrib><title>A chains model for localizing participants of group activities in videos</title><title>2011 International Conference on Computer Vision</title><addtitle>ICCV</addtitle><description>Given a video, we would like to recognize group activities, localize video parts where these activities occur, and detect actors involved in them. This advances prior work that typically focuses only on video classification. We make a number of contributions. First, we specify a new, mid-level, video feature aimed at summarizing local visual cues into bags of the right detections (BORDs). BORDs seek to identify the right people who participate in a target group activity among many noisy people detections. Second, we formulate a new, generative, chains model of group activities. Inference of the chains model identifies a subset of BORDs in the video that belong to occurrences of the activity, and organizes them in an ensemble of temporal chains. The chains extend over, and thus localize, the time intervals occupied by the activity. We formulate a new MAP inference algorithm that iterates two steps: i) Warps the chains of BORDs in space and time to their expected locations, so the transformed BORDs can better summarize local visual cues; and ii) Maximizes the posterior probability of the chains. We outperform the state of the art on benchmark UT-Human Interaction and Collective Activities datasets, under reasonable running times.</description><subject>Feature extraction</subject><subject>Histograms</subject><subject>Humans</subject><subject>Layout</subject><subject>Spatiotemporal phenomena</subject><subject>Videos</subject><subject>Visualization</subject><issn>1550-5499</issn><issn>2380-7504</issn><isbn>9781457711015</isbn><isbn>145771101X</isbn><isbn>1457711001</isbn><isbn>1457711028</isbn><isbn>9781457711022</isbn><isbn>9781457711008</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2011</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNo1kMtOwzAUBc1LIi39AMTGP5Dg67eXVQS0UiU2wLYyjl0uSpMoDpXg66lEWc1iRmdxCLkFVgEwd7-u67eKM4BKA9cCzBmZgVTGHC2Dc1JwYVlpFJMXZOGM_XegLkkBSrFSSeeuySznT8aE41YXZLWk4cNjl-m-b2JLUz_Stg--xR_sdnTw44QBB99NmfaJ7sb-a6A-THjACWOm2NEDNrHPN-Qq-TbHxYlz8vr48FKvys3z07pebkrkYKcSgvMiCc6FjlxaH1PSNgYhtGLpHZKwSh5D1_ikobEmsdgErqJhQksepJiTu79djDFuhxH3fvzenh4RvxwQUEc</recordid><startdate>20110101</startdate><enddate>20110101</enddate><creator>Amer, M. R.</creator><creator>Todorovic, S.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>20110101</creationdate><title>A chains model for localizing participants of group activities in videos</title><author>Amer, M. R. ; Todorovic, S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i218t-1c9a3f32236e248aeff68ec33650fb1f38542189daf61d87f0edc25e703642c43</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Feature extraction</topic><topic>Histograms</topic><topic>Humans</topic><topic>Layout</topic><topic>Spatiotemporal phenomena</topic><topic>Videos</topic><topic>Visualization</topic><toplevel>online_resources</toplevel><creatorcontrib>Amer, M. R.</creatorcontrib><creatorcontrib>Todorovic, S.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Amer, M. R.</au><au>Todorovic, S.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A chains model for localizing participants of group activities in videos</atitle><btitle>2011 International Conference on Computer Vision</btitle><stitle>ICCV</stitle><date>2011-01-01</date><risdate>2011</risdate><spage>786</spage><epage>793</epage><pages>786-793</pages><issn>1550-5499</issn><eissn>2380-7504</eissn><isbn>9781457711015</isbn><isbn>145771101X</isbn><eisbn>1457711001</eisbn><eisbn>1457711028</eisbn><eisbn>9781457711022</eisbn><eisbn>9781457711008</eisbn><abstract>Given a video, we would like to recognize group activities, localize video parts where these activities occur, and detect actors involved in them. This advances prior work that typically focuses only on video classification. We make a number of contributions. First, we specify a new, mid-level, video feature aimed at summarizing local visual cues into bags of the right detections (BORDs). BORDs seek to identify the right people who participate in a target group activity among many noisy people detections. Second, we formulate a new, generative, chains model of group activities. Inference of the chains model identifies a subset of BORDs in the video that belong to occurrences of the activity, and organizes them in an ensemble of temporal chains. The chains extend over, and thus localize, the time intervals occupied by the activity. We formulate a new MAP inference algorithm that iterates two steps: i) Warps the chains of BORDs in space and time to their expected locations, so the transformed BORDs can better summarize local visual cues; and ii) Maximizes the posterior probability of the chains. We outperform the state of the art on benchmark UT-Human Interaction and Collective Activities datasets, under reasonable running times.</abstract><pub>IEEE</pub><doi>10.1109/ICCV.2011.6126317</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1550-5499
ispartof 2011 International Conference on Computer Vision, 2011, p.786-793
issn 1550-5499
2380-7504
language eng
recordid cdi_ieee_primary_6126317
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Feature extraction
Histograms
Humans
Layout
Spatiotemporal phenomena
Videos
Visualization
title A chains model for localizing participants of group activities in videos
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T20%3A41%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20chains%20model%20for%20localizing%20participants%20of%20group%20activities%20in%20videos&rft.btitle=2011%20International%20Conference%20on%20Computer%20Vision&rft.au=Amer,%20M.%20R.&rft.date=2011-01-01&rft.spage=786&rft.epage=793&rft.pages=786-793&rft.issn=1550-5499&rft.eissn=2380-7504&rft.isbn=9781457711015&rft.isbn_list=145771101X&rft_id=info:doi/10.1109/ICCV.2011.6126317&rft_dat=%3Cieee_6IE%3E6126317%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=1457711001&rft.eisbn_list=1457711028&rft.eisbn_list=9781457711022&rft.eisbn_list=9781457711008&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6126317&rfr_iscdi=true