A chains model for localizing participants of group activities in videos
Given a video, we would like to recognize group activities, localize video parts where these activities occur, and detect actors involved in them. This advances prior work that typically focuses only on video classification. We make a number of contributions. First, we specify a new, mid-level, vide...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Tagungsbericht |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 793 |
---|---|
container_issue | |
container_start_page | 786 |
container_title | |
container_volume | |
creator | Amer, M. R. Todorovic, S. |
description | Given a video, we would like to recognize group activities, localize video parts where these activities occur, and detect actors involved in them. This advances prior work that typically focuses only on video classification. We make a number of contributions. First, we specify a new, mid-level, video feature aimed at summarizing local visual cues into bags of the right detections (BORDs). BORDs seek to identify the right people who participate in a target group activity among many noisy people detections. Second, we formulate a new, generative, chains model of group activities. Inference of the chains model identifies a subset of BORDs in the video that belong to occurrences of the activity, and organizes them in an ensemble of temporal chains. The chains extend over, and thus localize, the time intervals occupied by the activity. We formulate a new MAP inference algorithm that iterates two steps: i) Warps the chains of BORDs in space and time to their expected locations, so the transformed BORDs can better summarize local visual cues; and ii) Maximizes the posterior probability of the chains. We outperform the state of the art on benchmark UT-Human Interaction and Collective Activities datasets, under reasonable running times. |
doi_str_mv | 10.1109/ICCV.2011.6126317 |
format | Conference Proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6126317</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6126317</ieee_id><sourcerecordid>6126317</sourcerecordid><originalsourceid>FETCH-LOGICAL-i218t-1c9a3f32236e248aeff68ec33650fb1f38542189daf61d87f0edc25e703642c43</originalsourceid><addsrcrecordid>eNo1kMtOwzAUBc1LIi39AMTGP5Dg67eXVQS0UiU2wLYyjl0uSpMoDpXg66lEWc1iRmdxCLkFVgEwd7-u67eKM4BKA9cCzBmZgVTGHC2Dc1JwYVlpFJMXZOGM_XegLkkBSrFSSeeuySznT8aE41YXZLWk4cNjl-m-b2JLUz_Stg--xR_sdnTw44QBB99NmfaJ7sb-a6A-THjACWOm2NEDNrHPN-Qq-TbHxYlz8vr48FKvys3z07pebkrkYKcSgvMiCc6FjlxaH1PSNgYhtGLpHZKwSh5D1_ikobEmsdgErqJhQksepJiTu79djDFuhxH3fvzenh4RvxwQUEc</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A chains model for localizing participants of group activities in videos</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Amer, M. R. ; Todorovic, S.</creator><creatorcontrib>Amer, M. R. ; Todorovic, S.</creatorcontrib><description>Given a video, we would like to recognize group activities, localize video parts where these activities occur, and detect actors involved in them. This advances prior work that typically focuses only on video classification. We make a number of contributions. First, we specify a new, mid-level, video feature aimed at summarizing local visual cues into bags of the right detections (BORDs). BORDs seek to identify the right people who participate in a target group activity among many noisy people detections. Second, we formulate a new, generative, chains model of group activities. Inference of the chains model identifies a subset of BORDs in the video that belong to occurrences of the activity, and organizes them in an ensemble of temporal chains. The chains extend over, and thus localize, the time intervals occupied by the activity. We formulate a new MAP inference algorithm that iterates two steps: i) Warps the chains of BORDs in space and time to their expected locations, so the transformed BORDs can better summarize local visual cues; and ii) Maximizes the posterior probability of the chains. We outperform the state of the art on benchmark UT-Human Interaction and Collective Activities datasets, under reasonable running times.</description><identifier>ISSN: 1550-5499</identifier><identifier>ISBN: 9781457711015</identifier><identifier>ISBN: 145771101X</identifier><identifier>EISSN: 2380-7504</identifier><identifier>EISBN: 1457711001</identifier><identifier>EISBN: 1457711028</identifier><identifier>EISBN: 9781457711022</identifier><identifier>EISBN: 9781457711008</identifier><identifier>DOI: 10.1109/ICCV.2011.6126317</identifier><language>eng</language><publisher>IEEE</publisher><subject>Feature extraction ; Histograms ; Humans ; Layout ; Spatiotemporal phenomena ; Videos ; Visualization</subject><ispartof>2011 International Conference on Computer Vision, 2011, p.786-793</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6126317$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6126317$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Amer, M. R.</creatorcontrib><creatorcontrib>Todorovic, S.</creatorcontrib><title>A chains model for localizing participants of group activities in videos</title><title>2011 International Conference on Computer Vision</title><addtitle>ICCV</addtitle><description>Given a video, we would like to recognize group activities, localize video parts where these activities occur, and detect actors involved in them. This advances prior work that typically focuses only on video classification. We make a number of contributions. First, we specify a new, mid-level, video feature aimed at summarizing local visual cues into bags of the right detections (BORDs). BORDs seek to identify the right people who participate in a target group activity among many noisy people detections. Second, we formulate a new, generative, chains model of group activities. Inference of the chains model identifies a subset of BORDs in the video that belong to occurrences of the activity, and organizes them in an ensemble of temporal chains. The chains extend over, and thus localize, the time intervals occupied by the activity. We formulate a new MAP inference algorithm that iterates two steps: i) Warps the chains of BORDs in space and time to their expected locations, so the transformed BORDs can better summarize local visual cues; and ii) Maximizes the posterior probability of the chains. We outperform the state of the art on benchmark UT-Human Interaction and Collective Activities datasets, under reasonable running times.</description><subject>Feature extraction</subject><subject>Histograms</subject><subject>Humans</subject><subject>Layout</subject><subject>Spatiotemporal phenomena</subject><subject>Videos</subject><subject>Visualization</subject><issn>1550-5499</issn><issn>2380-7504</issn><isbn>9781457711015</isbn><isbn>145771101X</isbn><isbn>1457711001</isbn><isbn>1457711028</isbn><isbn>9781457711022</isbn><isbn>9781457711008</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2011</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNo1kMtOwzAUBc1LIi39AMTGP5Dg67eXVQS0UiU2wLYyjl0uSpMoDpXg66lEWc1iRmdxCLkFVgEwd7-u67eKM4BKA9cCzBmZgVTGHC2Dc1JwYVlpFJMXZOGM_XegLkkBSrFSSeeuySznT8aE41YXZLWk4cNjl-m-b2JLUz_Stg--xR_sdnTw44QBB99NmfaJ7sb-a6A-THjACWOm2NEDNrHPN-Qq-TbHxYlz8vr48FKvys3z07pebkrkYKcSgvMiCc6FjlxaH1PSNgYhtGLpHZKwSh5D1_ikobEmsdgErqJhQksepJiTu79djDFuhxH3fvzenh4RvxwQUEc</recordid><startdate>20110101</startdate><enddate>20110101</enddate><creator>Amer, M. R.</creator><creator>Todorovic, S.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>20110101</creationdate><title>A chains model for localizing participants of group activities in videos</title><author>Amer, M. R. ; Todorovic, S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i218t-1c9a3f32236e248aeff68ec33650fb1f38542189daf61d87f0edc25e703642c43</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Feature extraction</topic><topic>Histograms</topic><topic>Humans</topic><topic>Layout</topic><topic>Spatiotemporal phenomena</topic><topic>Videos</topic><topic>Visualization</topic><toplevel>online_resources</toplevel><creatorcontrib>Amer, M. R.</creatorcontrib><creatorcontrib>Todorovic, S.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Amer, M. R.</au><au>Todorovic, S.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A chains model for localizing participants of group activities in videos</atitle><btitle>2011 International Conference on Computer Vision</btitle><stitle>ICCV</stitle><date>2011-01-01</date><risdate>2011</risdate><spage>786</spage><epage>793</epage><pages>786-793</pages><issn>1550-5499</issn><eissn>2380-7504</eissn><isbn>9781457711015</isbn><isbn>145771101X</isbn><eisbn>1457711001</eisbn><eisbn>1457711028</eisbn><eisbn>9781457711022</eisbn><eisbn>9781457711008</eisbn><abstract>Given a video, we would like to recognize group activities, localize video parts where these activities occur, and detect actors involved in them. This advances prior work that typically focuses only on video classification. We make a number of contributions. First, we specify a new, mid-level, video feature aimed at summarizing local visual cues into bags of the right detections (BORDs). BORDs seek to identify the right people who participate in a target group activity among many noisy people detections. Second, we formulate a new, generative, chains model of group activities. Inference of the chains model identifies a subset of BORDs in the video that belong to occurrences of the activity, and organizes them in an ensemble of temporal chains. The chains extend over, and thus localize, the time intervals occupied by the activity. We formulate a new MAP inference algorithm that iterates two steps: i) Warps the chains of BORDs in space and time to their expected locations, so the transformed BORDs can better summarize local visual cues; and ii) Maximizes the posterior probability of the chains. We outperform the state of the art on benchmark UT-Human Interaction and Collective Activities datasets, under reasonable running times.</abstract><pub>IEEE</pub><doi>10.1109/ICCV.2011.6126317</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1550-5499 |
ispartof | 2011 International Conference on Computer Vision, 2011, p.786-793 |
issn | 1550-5499 2380-7504 |
language | eng |
recordid | cdi_ieee_primary_6126317 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Feature extraction Histograms Humans Layout Spatiotemporal phenomena Videos Visualization |
title | A chains model for localizing participants of group activities in videos |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T20%3A41%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20chains%20model%20for%20localizing%20participants%20of%20group%20activities%20in%20videos&rft.btitle=2011%20International%20Conference%20on%20Computer%20Vision&rft.au=Amer,%20M.%20R.&rft.date=2011-01-01&rft.spage=786&rft.epage=793&rft.pages=786-793&rft.issn=1550-5499&rft.eissn=2380-7504&rft.isbn=9781457711015&rft.isbn_list=145771101X&rft_id=info:doi/10.1109/ICCV.2011.6126317&rft_dat=%3Cieee_6IE%3E6126317%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=1457711001&rft.eisbn_list=1457711028&rft.eisbn_list=9781457711022&rft.eisbn_list=9781457711008&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6126317&rfr_iscdi=true |