Large-Scale Automatic Labeling of Video Events with Verbs Based on Event-Participant Interaction

We present an approach to labeling short video clips with English verbs as event descriptions. A key distinguishing aspect of this work is that it labels videos with verbs that describe the spatiotemporal interaction between event participants, humans and objects interacting with each other, abstrac...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2012-04
Hauptverfasser:	Barbu, Andrei, Bridge, Alexander, Coroian, Dan, Dickinson, Sven, Mussman, Sam, Narayanaswamy, Siddharth, Salvi, Dhaval, Schmidt, Lara, Shangguan, Jiangnan, Siskind, Jeffrey Mark, Waggoner, Jarrell, Wang, Song, Wei, Jinlian, Yin, Yifan, Zhang, Zhiqi
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Classifiers Labeling Labels
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Barbu, Andrei Bridge, Alexander Coroian, Dan Dickinson, Sven Mussman, Sam Narayanaswamy, Siddharth Salvi, Dhaval Schmidt, Lara Shangguan, Jiangnan Siskind, Jeffrey Mark Waggoner, Jarrell Wang, Song Wei, Jinlian Yin, Yifan Zhang, Zhiqi
description	We present an approach to labeling short video clips with English verbs as event descriptions. A key distinguishing aspect of this work is that it labels videos with verbs that describe the spatiotemporal interaction between event participants, humans and objects interacting with each other, abstracting away all object-class information and fine-grained image characteristics, and relying solely on the coarse-grained motion of the event participants. We apply our approach to a large set of 22 distinct verb classes and a corpus of 2,584 videos, yielding two surprising outcomes. First, a classification accuracy of greater than 70% on a 1-out-of-22 labeling task and greater than 85% on a variety of 1-out-of-10 subsets of this labeling task is independent of the choice of which of two different time-series classifiers we employ. Second, we achieve this level of accuracy using a highly impoverished intermediate representation consisting solely of the bounding boxes of one or two event participants as a function of time. This indicates that successful event recognition depends more on the choice of appropriate features that characterize the linguistic invariants of the event classes than on the particular classifier algorithms.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2085728652</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2085728652</sourcerecordid><originalsourceid>FETCH-proquest_journals_20857286523</originalsourceid><addsrcrecordid>eNqNi8sKgkAUQIcgSMp_uNBasDEf2wqjwEVQuLWrXm3EZmxmrN8vqA9odRbnnAlzeBCsvGTN-Yy5xnS-7_Mo5mEYOOyaoW7JO1fYE2xGq-5oRQUZltQL2YJqIBc1KUifJK2Bl7A3yEmXBrZoqAYlv8o7of6cYkBp4SgtaaysUHLBpg32htwf52y5Ty-7gzdo9RjJ2KJTo5YfVXA_CWOeRCEP_qvefWpETQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2085728652</pqid></control><display><type>article</type><title>Large-Scale Automatic Labeling of Video Events with Verbs Based on Event-Participant Interaction</title><source>Free E- Journals</source><creator>Barbu, Andrei ; Bridge, Alexander ; Coroian, Dan ; Dickinson, Sven ; Mussman, Sam ; Narayanaswamy, Siddharth ; Salvi, Dhaval ; Schmidt, Lara ; Shangguan, Jiangnan ; Siskind, Jeffrey Mark ; Waggoner, Jarrell ; Wang, Song ; Wei, Jinlian ; Yin, Yifan ; Zhang, Zhiqi</creator><creatorcontrib>Barbu, Andrei ; Bridge, Alexander ; Coroian, Dan ; Dickinson, Sven ; Mussman, Sam ; Narayanaswamy, Siddharth ; Salvi, Dhaval ; Schmidt, Lara ; Shangguan, Jiangnan ; Siskind, Jeffrey Mark ; Waggoner, Jarrell ; Wang, Song ; Wei, Jinlian ; Yin, Yifan ; Zhang, Zhiqi</creatorcontrib><description>We present an approach to labeling short video clips with English verbs as event descriptions. A key distinguishing aspect of this work is that it labels videos with verbs that describe the spatiotemporal interaction between event participants, humans and objects interacting with each other, abstracting away all object-class information and fine-grained image characteristics, and relying solely on the coarse-grained motion of the event participants. We apply our approach to a large set of 22 distinct verb classes and a corpus of 2,584 videos, yielding two surprising outcomes. First, a classification accuracy of greater than 70% on a 1-out-of-22 labeling task and greater than 85% on a variety of 1-out-of-10 subsets of this labeling task is independent of the choice of which of two different time-series classifiers we employ. Second, we achieve this level of accuracy using a highly impoverished intermediate representation consisting solely of the bounding boxes of one or two event participants as a function of time. This indicates that successful event recognition depends more on the choice of appropriate features that characterize the linguistic invariants of the event classes than on the particular classifier algorithms.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Classifiers ; Labeling ; Labels</subject><ispartof>arXiv.org, 2012-04</ispartof><rights>2012. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>778,782</link.rule.ids></links><search><creatorcontrib>Barbu, Andrei</creatorcontrib><creatorcontrib>Bridge, Alexander</creatorcontrib><creatorcontrib>Coroian, Dan</creatorcontrib><creatorcontrib>Dickinson, Sven</creatorcontrib><creatorcontrib>Mussman, Sam</creatorcontrib><creatorcontrib>Narayanaswamy, Siddharth</creatorcontrib><creatorcontrib>Salvi, Dhaval</creatorcontrib><creatorcontrib>Schmidt, Lara</creatorcontrib><creatorcontrib>Shangguan, Jiangnan</creatorcontrib><creatorcontrib>Siskind, Jeffrey Mark</creatorcontrib><creatorcontrib>Waggoner, Jarrell</creatorcontrib><creatorcontrib>Wang, Song</creatorcontrib><creatorcontrib>Wei, Jinlian</creatorcontrib><creatorcontrib>Yin, Yifan</creatorcontrib><creatorcontrib>Zhang, Zhiqi</creatorcontrib><title>Large-Scale Automatic Labeling of Video Events with Verbs Based on Event-Participant Interaction</title><title>arXiv.org</title><description>We present an approach to labeling short video clips with English verbs as event descriptions. A key distinguishing aspect of this work is that it labels videos with verbs that describe the spatiotemporal interaction between event participants, humans and objects interacting with each other, abstracting away all object-class information and fine-grained image characteristics, and relying solely on the coarse-grained motion of the event participants. We apply our approach to a large set of 22 distinct verb classes and a corpus of 2,584 videos, yielding two surprising outcomes. First, a classification accuracy of greater than 70% on a 1-out-of-22 labeling task and greater than 85% on a variety of 1-out-of-10 subsets of this labeling task is independent of the choice of which of two different time-series classifiers we employ. Second, we achieve this level of accuracy using a highly impoverished intermediate representation consisting solely of the bounding boxes of one or two event participants as a function of time. This indicates that successful event recognition depends more on the choice of appropriate features that characterize the linguistic invariants of the event classes than on the particular classifier algorithms.</description><subject>Algorithms</subject><subject>Classifiers</subject><subject>Labeling</subject><subject>Labels</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNi8sKgkAUQIcgSMp_uNBasDEf2wqjwEVQuLWrXm3EZmxmrN8vqA9odRbnnAlzeBCsvGTN-Yy5xnS-7_Mo5mEYOOyaoW7JO1fYE2xGq-5oRQUZltQL2YJqIBc1KUifJK2Bl7A3yEmXBrZoqAYlv8o7of6cYkBp4SgtaaysUHLBpg32htwf52y5Ty-7gzdo9RjJ2KJTo5YfVXA_CWOeRCEP_qvefWpETQ</recordid><startdate>20120416</startdate><enddate>20120416</enddate><creator>Barbu, Andrei</creator><creator>Bridge, Alexander</creator><creator>Coroian, Dan</creator><creator>Dickinson, Sven</creator><creator>Mussman, Sam</creator><creator>Narayanaswamy, Siddharth</creator><creator>Salvi, Dhaval</creator><creator>Schmidt, Lara</creator><creator>Shangguan, Jiangnan</creator><creator>Siskind, Jeffrey Mark</creator><creator>Waggoner, Jarrell</creator><creator>Wang, Song</creator><creator>Wei, Jinlian</creator><creator>Yin, Yifan</creator><creator>Zhang, Zhiqi</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20120416</creationdate><title>Large-Scale Automatic Labeling of Video Events with Verbs Based on Event-Participant Interaction</title><author>Barbu, Andrei ; Bridge, Alexander ; Coroian, Dan ; Dickinson, Sven ; Mussman, Sam ; Narayanaswamy, Siddharth ; Salvi, Dhaval ; Schmidt, Lara ; Shangguan, Jiangnan ; Siskind, Jeffrey Mark ; Waggoner, Jarrell ; Wang, Song ; Wei, Jinlian ; Yin, Yifan ; Zhang, Zhiqi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_20857286523</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Algorithms</topic><topic>Classifiers</topic><topic>Labeling</topic><topic>Labels</topic><toplevel>online_resources</toplevel><creatorcontrib>Barbu, Andrei</creatorcontrib><creatorcontrib>Bridge, Alexander</creatorcontrib><creatorcontrib>Coroian, Dan</creatorcontrib><creatorcontrib>Dickinson, Sven</creatorcontrib><creatorcontrib>Mussman, Sam</creatorcontrib><creatorcontrib>Narayanaswamy, Siddharth</creatorcontrib><creatorcontrib>Salvi, Dhaval</creatorcontrib><creatorcontrib>Schmidt, Lara</creatorcontrib><creatorcontrib>Shangguan, Jiangnan</creatorcontrib><creatorcontrib>Siskind, Jeffrey Mark</creatorcontrib><creatorcontrib>Waggoner, Jarrell</creatorcontrib><creatorcontrib>Wang, Song</creatorcontrib><creatorcontrib>Wei, Jinlian</creatorcontrib><creatorcontrib>Yin, Yifan</creatorcontrib><creatorcontrib>Zhang, Zhiqi</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Barbu, Andrei</au><au>Bridge, Alexander</au><au>Coroian, Dan</au><au>Dickinson, Sven</au><au>Mussman, Sam</au><au>Narayanaswamy, Siddharth</au><au>Salvi, Dhaval</au><au>Schmidt, Lara</au><au>Shangguan, Jiangnan</au><au>Siskind, Jeffrey Mark</au><au>Waggoner, Jarrell</au><au>Wang, Song</au><au>Wei, Jinlian</au><au>Yin, Yifan</au><au>Zhang, Zhiqi</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Large-Scale Automatic Labeling of Video Events with Verbs Based on Event-Participant Interaction</atitle><jtitle>arXiv.org</jtitle><date>2012-04-16</date><risdate>2012</risdate><eissn>2331-8422</eissn><abstract>We present an approach to labeling short video clips with English verbs as event descriptions. A key distinguishing aspect of this work is that it labels videos with verbs that describe the spatiotemporal interaction between event participants, humans and objects interacting with each other, abstracting away all object-class information and fine-grained image characteristics, and relying solely on the coarse-grained motion of the event participants. We apply our approach to a large set of 22 distinct verb classes and a corpus of 2,584 videos, yielding two surprising outcomes. First, a classification accuracy of greater than 70% on a 1-out-of-22 labeling task and greater than 85% on a variety of 1-out-of-10 subsets of this labeling task is independent of the choice of which of two different time-series classifiers we employ. Second, we achieve this level of accuracy using a highly impoverished intermediate representation consisting solely of the bounding boxes of one or two event participants as a function of time. This indicates that successful event recognition depends more on the choice of appropriate features that characterize the linguistic invariants of the event classes than on the particular classifier algorithms.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2012-04
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2085728652
source	Free E- Journals
subjects	Algorithms Classifiers Labeling Labels
title	Large-Scale Automatic Labeling of Video Events with Verbs Based on Event-Participant Interaction
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T22%3A59%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Large-Scale%20Automatic%20Labeling%20of%20Video%20Events%20with%20Verbs%20Based%20on%20Event-Participant%20Interaction&rft.jtitle=arXiv.org&rft.au=Barbu,%20Andrei&rft.date=2012-04-16&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2085728652%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2085728652&rft_id=info:pmid/&rfr_iscdi=true