When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs
We consider the task of temporal human action localization in lifestyle vlogs. We introduce a novel dataset consisting of manual annotations of temporal localization for 13,000 narrated actions in 1,200 video clips. We present an extensive analysis of this data, which allows us to better understand...
Gespeichert in:
Veröffentlicht in: | ACM transactions on multimedia computing communications and applications 2022-11, Vol.18 (3s), p.1-18, Article 142 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 18 |
---|---|
container_issue | 3s |
container_start_page | 1 |
container_title | ACM transactions on multimedia computing communications and applications |
container_volume | 18 |
creator | Ignat, Oana Castro, Santiago Zhou, Yuhang Bao, Jiajun Shan, Dandan Mihalcea, Rada |
description | We consider the task of temporal human action localization in lifestyle vlogs. We introduce a novel dataset consisting of manual annotations of temporal localization for 13,000 narrated actions in 1,200 video clips. We present an extensive analysis of this data, which allows us to better understand how the language and visual modalities interact throughout the videos. We propose a simple yet effective method to localize the narrated actions based on their expected duration. Through several experiments and analyses, we show that our method brings complementary information with respect to previous methods, and leads to improvements over previous work for the task of temporal action localization. |
doi_str_mv | 10.1145/3495211 |
format | Article |
fullrecord | <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3495211</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3495211</sourcerecordid><originalsourceid>FETCH-LOGICAL-a244t-38d9dd1398a230c85ecb040989fb985ddd8c75388c6b16c07dc9438e38eb3d5b3</originalsourceid><addsrcrecordid>eNo9kD1PwzAYhC0EEqUgdiZvTAY7_og9oaottFIES_nYIsd2wCiJIzsM8OtJaal00nu6e_QOB8AlwTeEMH5LmeIZIUdgQjgnSEjBjw-e56fgLKVPjKngTEzA2-uH6-DCW7ge4Er3vevu4OIr6sGHDvmuDrF1Fm5c24eoG1gEoxv_81fDUMNHHUd2JGZmGyXoO_jShPd0Dk5q3SR3sb9T8Hy_3MxXqHh6WM9nBdIZYwOi0iprCVVSZxQbyZ2pMMNKqrpSkltrpck5ldKIigiDc2sUo9KNqqjlFZ2C691fE0NK0dVlH32r43dJcLkdpNwPMpJXO1Kb9gD9l7_eoVqD</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs</title><source>ACM Digital Library Complete</source><creator>Ignat, Oana ; Castro, Santiago ; Zhou, Yuhang ; Bao, Jiajun ; Shan, Dandan ; Mihalcea, Rada</creator><creatorcontrib>Ignat, Oana ; Castro, Santiago ; Zhou, Yuhang ; Bao, Jiajun ; Shan, Dandan ; Mihalcea, Rada</creatorcontrib><description>We consider the task of temporal human action localization in lifestyle vlogs. We introduce a novel dataset consisting of manual annotations of temporal localization for 13,000 narrated actions in 1,200 video clips. We present an extensive analysis of this data, which allows us to better understand how the language and visual modalities interact throughout the videos. We propose a simple yet effective method to localize the narrated actions based on their expected duration. Through several experiments and analyses, we show that our method brings complementary information with respect to previous methods, and leads to improvements over previous work for the task of temporal action localization.</description><identifier>ISSN: 1551-6857</identifier><identifier>EISSN: 1551-6865</identifier><identifier>DOI: 10.1145/3495211</identifier><language>eng</language><publisher>New York, NY: ACM</publisher><subject>Information systems ; Multimedia streaming</subject><ispartof>ACM transactions on multimedia computing communications and applications, 2022-11, Vol.18 (3s), p.1-18, Article 142</ispartof><rights>Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a244t-38d9dd1398a230c85ecb040989fb985ddd8c75388c6b16c07dc9438e38eb3d5b3</citedby><cites>FETCH-LOGICAL-a244t-38d9dd1398a230c85ecb040989fb985ddd8c75388c6b16c07dc9438e38eb3d5b3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3495211$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,780,784,2280,27923,27924,40195,75999</link.rule.ids></links><search><creatorcontrib>Ignat, Oana</creatorcontrib><creatorcontrib>Castro, Santiago</creatorcontrib><creatorcontrib>Zhou, Yuhang</creatorcontrib><creatorcontrib>Bao, Jiajun</creatorcontrib><creatorcontrib>Shan, Dandan</creatorcontrib><creatorcontrib>Mihalcea, Rada</creatorcontrib><title>When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs</title><title>ACM transactions on multimedia computing communications and applications</title><addtitle>ACM TOMM</addtitle><description>We consider the task of temporal human action localization in lifestyle vlogs. We introduce a novel dataset consisting of manual annotations of temporal localization for 13,000 narrated actions in 1,200 video clips. We present an extensive analysis of this data, which allows us to better understand how the language and visual modalities interact throughout the videos. We propose a simple yet effective method to localize the narrated actions based on their expected duration. Through several experiments and analyses, we show that our method brings complementary information with respect to previous methods, and leads to improvements over previous work for the task of temporal action localization.</description><subject>Information systems</subject><subject>Multimedia streaming</subject><issn>1551-6857</issn><issn>1551-6865</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNo9kD1PwzAYhC0EEqUgdiZvTAY7_og9oaottFIES_nYIsd2wCiJIzsM8OtJaal00nu6e_QOB8AlwTeEMH5LmeIZIUdgQjgnSEjBjw-e56fgLKVPjKngTEzA2-uH6-DCW7ge4Er3vevu4OIr6sGHDvmuDrF1Fm5c24eoG1gEoxv_81fDUMNHHUd2JGZmGyXoO_jShPd0Dk5q3SR3sb9T8Hy_3MxXqHh6WM9nBdIZYwOi0iprCVVSZxQbyZ2pMMNKqrpSkltrpck5ldKIigiDc2sUo9KNqqjlFZ2C691fE0NK0dVlH32r43dJcLkdpNwPMpJXO1Kb9gD9l7_eoVqD</recordid><startdate>20221102</startdate><enddate>20221102</enddate><creator>Ignat, Oana</creator><creator>Castro, Santiago</creator><creator>Zhou, Yuhang</creator><creator>Bao, Jiajun</creator><creator>Shan, Dandan</creator><creator>Mihalcea, Rada</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20221102</creationdate><title>When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs</title><author>Ignat, Oana ; Castro, Santiago ; Zhou, Yuhang ; Bao, Jiajun ; Shan, Dandan ; Mihalcea, Rada</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a244t-38d9dd1398a230c85ecb040989fb985ddd8c75388c6b16c07dc9438e38eb3d5b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Information systems</topic><topic>Multimedia streaming</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ignat, Oana</creatorcontrib><creatorcontrib>Castro, Santiago</creatorcontrib><creatorcontrib>Zhou, Yuhang</creatorcontrib><creatorcontrib>Bao, Jiajun</creatorcontrib><creatorcontrib>Shan, Dandan</creatorcontrib><creatorcontrib>Mihalcea, Rada</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on multimedia computing communications and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ignat, Oana</au><au>Castro, Santiago</au><au>Zhou, Yuhang</au><au>Bao, Jiajun</au><au>Shan, Dandan</au><au>Mihalcea, Rada</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs</atitle><jtitle>ACM transactions on multimedia computing communications and applications</jtitle><stitle>ACM TOMM</stitle><date>2022-11-02</date><risdate>2022</risdate><volume>18</volume><issue>3s</issue><spage>1</spage><epage>18</epage><pages>1-18</pages><artnum>142</artnum><issn>1551-6857</issn><eissn>1551-6865</eissn><abstract>We consider the task of temporal human action localization in lifestyle vlogs. We introduce a novel dataset consisting of manual annotations of temporal localization for 13,000 narrated actions in 1,200 video clips. We present an extensive analysis of this data, which allows us to better understand how the language and visual modalities interact throughout the videos. We propose a simple yet effective method to localize the narrated actions based on their expected duration. Through several experiments and analyses, we show that our method brings complementary information with respect to previous methods, and leads to improvements over previous work for the task of temporal action localization.</abstract><cop>New York, NY</cop><pub>ACM</pub><doi>10.1145/3495211</doi><tpages>18</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1551-6857 |
ispartof | ACM transactions on multimedia computing communications and applications, 2022-11, Vol.18 (3s), p.1-18, Article 142 |
issn | 1551-6857 1551-6865 |
language | eng |
recordid | cdi_crossref_primary_10_1145_3495211 |
source | ACM Digital Library Complete |
subjects | Information systems Multimedia streaming |
title | When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T13%3A40%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=When%20Did%20It%20Happen?%20Duration-informed%20Temporal%20Localization%20of%20Narrated%20Actions%20in%20Vlogs&rft.jtitle=ACM%20transactions%20on%20multimedia%20computing%20communications%20and%20applications&rft.au=Ignat,%20Oana&rft.date=2022-11-02&rft.volume=18&rft.issue=3s&rft.spage=1&rft.epage=18&rft.pages=1-18&rft.artnum=142&rft.issn=1551-6857&rft.eissn=1551-6865&rft_id=info:doi/10.1145/3495211&rft_dat=%3Cacm_cross%3E3495211%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |