Learning realistic human actions from movies

The aim of this paper is to address recognition of natural human actions in diverse and realistic video settings. This challenging but important subject has mostly been ignored in the past due to several problems one of which is the lack of realistic and annotated video datasets. Our first contribut...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Cameras Computer Science Computer Vision and Pattern Recognition Humans Image recognition Layout Motion pictures Object recognition Robustness Text categorization Video sharing
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	8
container_issue
container_start_page	1
container_title
container_volume
creator	Laptev, I. Marszalek, M. Schmid, C. Rozenfeld, B.
description	The aim of this paper is to address recognition of natural human actions in diverse and realistic video settings. This challenging but important subject has mostly been ignored in the past due to several problems one of which is the lack of realistic and annotated video datasets. Our first contribution is to address this limitation and to investigate the use of movie scripts for automatic annotation of human actions in videos. We evaluate alternative methods for action retrieval from scripts and show benefits of a text-based classifier. Using the retrieved action samples for visual learning, we next turn to the problem of action classification in video. We present a new method for video classification that builds upon and extends several recent ideas including local space-time features, space-time pyramids and multi-channel non-linear SVMs. The method is shown to improve state-of-the-art results on the standard KTH action dataset by achieving 91.8% accuracy. Given the inherent problem of noisy labels in automatic annotation, we particularly investigate and show high tolerance of our method to annotation errors in the training set. We finally apply the method to learning and classifying challenging action classes in movies and show promising results.
doi_str_mv	10.1109/CVPR.2008.4587756
format	Conference Proceeding
fullrecord	<record><control><sourceid>hal_6IE</sourceid><recordid>TN_cdi_ieee_primary_4587756</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4587756</ieee_id><sourcerecordid>oai_HAL_inria_00548659v1</sourcerecordid><originalsourceid>FETCH-LOGICAL-c368t-becbc38396e6db8fc65be6acc687dcfc0cb07d190dd96d251fe696c6c105b8853</originalsourceid><addsrcrecordid>eNpVkM1Kw0AYRUdUsNQ8gLjJXhPn98vMsgRrhYAi6jbMfDOxI_mRJBZ8ewstgndzuHC4i0vIFaM5Y9Tcle_PLzmnVOdS6aJQcEISU2gmuZScS8FP_3WuzsiCURAZGGYuSDJNn3QfqQQwWJDbKtixj_1HOgbbxmmOmG6_O9unFuc49FPajEOXdsMuhumSnDe2nUJy5JK8re9fy01WPT08lqsqQwF6zlxAh0ILAwG80w2CcgEsIujCY4MUHS08M9R7A54r1gQwgICMKqe1Ektyc9jd2rb-GmNnx596sLHerKo69mO0NaVKalBmx_b29cGOIYQ__fiO-AWMDVVk</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Learning realistic human actions from movies</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Laptev, I. ; Marszalek, M. ; Schmid, C. ; Rozenfeld, B.</creator><creatorcontrib>Laptev, I. ; Marszalek, M. ; Schmid, C. ; Rozenfeld, B.</creatorcontrib><description>The aim of this paper is to address recognition of natural human actions in diverse and realistic video settings. This challenging but important subject has mostly been ignored in the past due to several problems one of which is the lack of realistic and annotated video datasets. Our first contribution is to address this limitation and to investigate the use of movie scripts for automatic annotation of human actions in videos. We evaluate alternative methods for action retrieval from scripts and show benefits of a text-based classifier. Using the retrieved action samples for visual learning, we next turn to the problem of action classification in video. We present a new method for video classification that builds upon and extends several recent ideas including local space-time features, space-time pyramids and multi-channel non-linear SVMs. The method is shown to improve state-of-the-art results on the standard KTH action dataset by achieving 91.8% accuracy. Given the inherent problem of noisy labels in automatic annotation, we particularly investigate and show high tolerance of our method to annotation errors in the training set. We finally apply the method to learning and classifying challenging action classes in movies and show promising results.</description><identifier>ISSN: 1063-6919</identifier><identifier>ISBN: 9781424422425</identifier><identifier>ISBN: 1424422426</identifier><identifier>EISBN: 9781424422432</identifier><identifier>EISBN: 1424422434</identifier><identifier>DOI: 10.1109/CVPR.2008.4587756</identifier><language>eng</language><publisher>IEEE</publisher><subject>Cameras ; Computer Science ; Computer Vision and Pattern Recognition ; Humans ; Image recognition ; Layout ; Motion pictures ; Object recognition ; Robustness ; Text categorization ; Video sharing</subject><ispartof>2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008, p.1-8</ispartof><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c368t-becbc38396e6db8fc65be6acc687dcfc0cb07d190dd96d251fe696c6c105b8853</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4587756$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>230,309,310,776,780,785,786,881,2052,27904,54898</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4587756$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://inria.hal.science/inria-00548659$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Laptev, I.</creatorcontrib><creatorcontrib>Marszalek, M.</creatorcontrib><creatorcontrib>Schmid, C.</creatorcontrib><creatorcontrib>Rozenfeld, B.</creatorcontrib><title>Learning realistic human actions from movies</title><title>2008 IEEE Conference on Computer Vision and Pattern Recognition</title><addtitle>CVPR</addtitle><description>The aim of this paper is to address recognition of natural human actions in diverse and realistic video settings. This challenging but important subject has mostly been ignored in the past due to several problems one of which is the lack of realistic and annotated video datasets. Our first contribution is to address this limitation and to investigate the use of movie scripts for automatic annotation of human actions in videos. We evaluate alternative methods for action retrieval from scripts and show benefits of a text-based classifier. Using the retrieved action samples for visual learning, we next turn to the problem of action classification in video. We present a new method for video classification that builds upon and extends several recent ideas including local space-time features, space-time pyramids and multi-channel non-linear SVMs. The method is shown to improve state-of-the-art results on the standard KTH action dataset by achieving 91.8% accuracy. Given the inherent problem of noisy labels in automatic annotation, we particularly investigate and show high tolerance of our method to annotation errors in the training set. We finally apply the method to learning and classifying challenging action classes in movies and show promising results.</description><subject>Cameras</subject><subject>Computer Science</subject><subject>Computer Vision and Pattern Recognition</subject><subject>Humans</subject><subject>Image recognition</subject><subject>Layout</subject><subject>Motion pictures</subject><subject>Object recognition</subject><subject>Robustness</subject><subject>Text categorization</subject><subject>Video sharing</subject><issn>1063-6919</issn><isbn>9781424422425</isbn><isbn>1424422426</isbn><isbn>9781424422432</isbn><isbn>1424422434</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2008</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpVkM1Kw0AYRUdUsNQ8gLjJXhPn98vMsgRrhYAi6jbMfDOxI_mRJBZ8ewstgndzuHC4i0vIFaM5Y9Tcle_PLzmnVOdS6aJQcEISU2gmuZScS8FP_3WuzsiCURAZGGYuSDJNn3QfqQQwWJDbKtixj_1HOgbbxmmOmG6_O9unFuc49FPajEOXdsMuhumSnDe2nUJy5JK8re9fy01WPT08lqsqQwF6zlxAh0ILAwG80w2CcgEsIujCY4MUHS08M9R7A54r1gQwgICMKqe1Ektyc9jd2rb-GmNnx596sLHerKo69mO0NaVKalBmx_b29cGOIYQ__fiO-AWMDVVk</recordid><startdate>20080101</startdate><enddate>20080101</enddate><creator>Laptev, I.</creator><creator>Marszalek, M.</creator><creator>Schmid, C.</creator><creator>Rozenfeld, B.</creator><general>IEEE</general><general>IEEE Computer Society</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope><scope>1XC</scope><scope>VOOES</scope></search><sort><creationdate>20080101</creationdate><title>Learning realistic human actions from movies</title><author>Laptev, I. ; Marszalek, M. ; Schmid, C. ; Rozenfeld, B.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c368t-becbc38396e6db8fc65be6acc687dcfc0cb07d190dd96d251fe696c6c105b8853</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Cameras</topic><topic>Computer Science</topic><topic>Computer Vision and Pattern Recognition</topic><topic>Humans</topic><topic>Image recognition</topic><topic>Layout</topic><topic>Motion pictures</topic><topic>Object recognition</topic><topic>Robustness</topic><topic>Text categorization</topic><topic>Video sharing</topic><toplevel>online_resources</toplevel><creatorcontrib>Laptev, I.</creatorcontrib><creatorcontrib>Marszalek, M.</creatorcontrib><creatorcontrib>Schmid, C.</creatorcontrib><creatorcontrib>Rozenfeld, B.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Laptev, I.</au><au>Marszalek, M.</au><au>Schmid, C.</au><au>Rozenfeld, B.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Learning realistic human actions from movies</atitle><btitle>2008 IEEE Conference on Computer Vision and Pattern Recognition</btitle><stitle>CVPR</stitle><date>2008-01-01</date><risdate>2008</risdate><spage>1</spage><epage>8</epage><pages>1-8</pages><issn>1063-6919</issn><isbn>9781424422425</isbn><isbn>1424422426</isbn><eisbn>9781424422432</eisbn><eisbn>1424422434</eisbn><abstract>The aim of this paper is to address recognition of natural human actions in diverse and realistic video settings. This challenging but important subject has mostly been ignored in the past due to several problems one of which is the lack of realistic and annotated video datasets. Our first contribution is to address this limitation and to investigate the use of movie scripts for automatic annotation of human actions in videos. We evaluate alternative methods for action retrieval from scripts and show benefits of a text-based classifier. Using the retrieved action samples for visual learning, we next turn to the problem of action classification in video. We present a new method for video classification that builds upon and extends several recent ideas including local space-time features, space-time pyramids and multi-channel non-linear SVMs. The method is shown to improve state-of-the-art results on the standard KTH action dataset by achieving 91.8% accuracy. Given the inherent problem of noisy labels in automatic annotation, we particularly investigate and show high tolerance of our method to annotation errors in the training set. We finally apply the method to learning and classifying challenging action classes in movies and show promising results.</abstract><pub>IEEE</pub><doi>10.1109/CVPR.2008.4587756</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1063-6919
ispartof	2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008, p.1-8
issn	1063-6919
language	eng
recordid	cdi_ieee_primary_4587756
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Cameras Computer Science Computer Vision and Pattern Recognition Humans Image recognition Layout Motion pictures Object recognition Robustness Text categorization Video sharing
title	Learning realistic human actions from movies
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T13%3A37%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-hal_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Learning%20realistic%20human%20actions%20from%20movies&rft.btitle=2008%20IEEE%20Conference%20on%20Computer%20Vision%20and%20Pattern%20Recognition&rft.au=Laptev,%20I.&rft.date=2008-01-01&rft.spage=1&rft.epage=8&rft.pages=1-8&rft.issn=1063-6919&rft.isbn=9781424422425&rft.isbn_list=1424422426&rft_id=info:doi/10.1109/CVPR.2008.4587756&rft_dat=%3Chal_6IE%3Eoai_HAL_inria_00548659v1%3C/hal_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=9781424422432&rft.eisbn_list=1424422434&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=4587756&rfr_iscdi=true