Jointly Learning the Attributes and Composition of Shots for Boundary Detection in Videos

In film making, shot has a profound influence on how the movie content is delivered and how the audiences are echoed, where different emotions and contents can be delivered through well-designed camera movements or shot editing. Therefore, in pursuit of high-level understanding of long videos, accur...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on multimedia 2022-01, Vol.24, p.3049-3059
Hauptverfasser: Jiang, Xuekun, Jin, Libiao, Rao, Anyi, Xu, Linning, Lin, Dahua
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 3059
container_issue
container_start_page 3049
container_title IEEE transactions on multimedia
container_volume 24
creator Jiang, Xuekun
Jin, Libiao
Rao, Anyi
Xu, Linning
Lin, Dahua
description In film making, shot has a profound influence on how the movie content is delivered and how the audiences are echoed, where different emotions and contents can be delivered through well-designed camera movements or shot editing. Therefore, in pursuit of high-level understanding of long videos, accurate shot detection from untrimmed videos should be considered as the first and the most fundamental step. Existing approaches address this problem based on the visual differences and content transitions between consecutive frames, while ignoring intrinsic shot attributes, viz. , camera movements, scales, and viewing angles, which essentially reveal how each shot is created. In this work, we propose a new learning framework (SCTSNet) for shot boundary detection by jointly recognizing the attributes and composition of shots in videos. To facilitate the analysis of shots and the evaluation of shot detection models, we collect a large-scale shot boundary dataset MovieShots2 , which contains \text{15}\,K shots from 282 movie clips. It is richly annotated with the temporal boundary between consecutive shots and individual shot attributes, including camera movements, scales, and viewing angles, which are the three most distinct shot attributes. Our experiments show that the joint learning framework can significantly boost the boundary detection performance, surpassing the previous scores by a large margin. SCTSNet improves shot boundary detection AP from 0.65 to 0.77, pushing the performance to a new level.
doi_str_mv 10.1109/TMM.2021.3092143
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_9464668</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9464668</ieee_id><sourcerecordid>2675043181</sourcerecordid><originalsourceid>FETCH-LOGICAL-c291t-d51af1bc919928c798f84f5e23588a4b1ea427a499765548325d1d3628d915433</originalsourceid><addsrcrecordid>eNo9kEtLAzEUhYMoWB97wU3A9dTcPGaSZa1vWlxYBVchncnYlDapSWbRf-_UFlf3wjnn3sOH0BWQIQBRt7PpdEgJhSEjigJnR2gAikNBSFUd97ugpOgFcorOUloSAlyQaoC-XoPzebXFE2uid_4b54XFo5yjm3fZJmx8g8dhvQnJZRc8Di1-X4SccBsivgudb0zc4nubbf2nO48_XWNDukAnrVkle3mY5-jj8WE2fi4mb08v49GkqKmCXDQCTAvzWoFSVNaVkq3krbCUCSkNn4M1nFaGK1WVQnDJqGigYSWVjQLBGTtHN_u7mxh-OpuyXoYu-v6lpmUlCGcgoXeRvauOIaVoW72Jbt1X10D0DqDuAeodQH0A2Eeu9xFnrf23K17yspTsF_lfatw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2675043181</pqid></control><display><type>article</type><title>Jointly Learning the Attributes and Composition of Shots for Boundary Detection in Videos</title><source>IEEE Electronic Library (IEL)</source><creator>Jiang, Xuekun ; Jin, Libiao ; Rao, Anyi ; Xu, Linning ; Lin, Dahua</creator><creatorcontrib>Jiang, Xuekun ; Jin, Libiao ; Rao, Anyi ; Xu, Linning ; Lin, Dahua</creatorcontrib><description>In film making, shot has a profound influence on how the movie content is delivered and how the audiences are echoed, where different emotions and contents can be delivered through well-designed camera movements or shot editing. Therefore, in pursuit of high-level understanding of long videos, accurate shot detection from untrimmed videos should be considered as the first and the most fundamental step. Existing approaches address this problem based on the visual differences and content transitions between consecutive frames, while ignoring intrinsic shot attributes, viz. , camera movements, scales, and viewing angles, which essentially reveal how each shot is created. In this work, we propose a new learning framework (SCTSNet) for shot boundary detection by jointly recognizing the attributes and composition of shots in videos. To facilitate the analysis of shots and the evaluation of shot detection models, we collect a large-scale shot boundary dataset MovieShots2 , which contains &lt;inline-formula&gt;&lt;tex-math notation="LaTeX"&gt;\text{15}\,K&lt;/tex-math&gt;&lt;/inline-formula&gt; shots from 282 movie clips. It is richly annotated with the temporal boundary between consecutive shots and individual shot attributes, including camera movements, scales, and viewing angles, which are the three most distinct shot attributes. Our experiments show that the joint learning framework can significantly boost the boundary detection performance, surpassing the previous scores by a large margin. SCTSNet improves shot boundary detection AP from 0.65 to 0.77, pushing the performance to a new level.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2021.3092143</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Annotations ; boundary detection ; Cameras ; cinematic style ; Composition ; Convolution ; Feature extraction ; Learning ; Motion pictures ; Shot type ; Video ; Videos ; Viewing ; Visualization</subject><ispartof>IEEE transactions on multimedia, 2022-01, Vol.24, p.3049-3059</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c291t-d51af1bc919928c798f84f5e23588a4b1ea427a499765548325d1d3628d915433</citedby><cites>FETCH-LOGICAL-c291t-d51af1bc919928c798f84f5e23588a4b1ea427a499765548325d1d3628d915433</cites><orcidid>0000-0003-4530-2996 ; 0000-0001-6441-4366 ; 0000-0003-1004-7753</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9464668$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9464668$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Jiang, Xuekun</creatorcontrib><creatorcontrib>Jin, Libiao</creatorcontrib><creatorcontrib>Rao, Anyi</creatorcontrib><creatorcontrib>Xu, Linning</creatorcontrib><creatorcontrib>Lin, Dahua</creatorcontrib><title>Jointly Learning the Attributes and Composition of Shots for Boundary Detection in Videos</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>In film making, shot has a profound influence on how the movie content is delivered and how the audiences are echoed, where different emotions and contents can be delivered through well-designed camera movements or shot editing. Therefore, in pursuit of high-level understanding of long videos, accurate shot detection from untrimmed videos should be considered as the first and the most fundamental step. Existing approaches address this problem based on the visual differences and content transitions between consecutive frames, while ignoring intrinsic shot attributes, viz. , camera movements, scales, and viewing angles, which essentially reveal how each shot is created. In this work, we propose a new learning framework (SCTSNet) for shot boundary detection by jointly recognizing the attributes and composition of shots in videos. To facilitate the analysis of shots and the evaluation of shot detection models, we collect a large-scale shot boundary dataset MovieShots2 , which contains &lt;inline-formula&gt;&lt;tex-math notation="LaTeX"&gt;\text{15}\,K&lt;/tex-math&gt;&lt;/inline-formula&gt; shots from 282 movie clips. It is richly annotated with the temporal boundary between consecutive shots and individual shot attributes, including camera movements, scales, and viewing angles, which are the three most distinct shot attributes. Our experiments show that the joint learning framework can significantly boost the boundary detection performance, surpassing the previous scores by a large margin. SCTSNet improves shot boundary detection AP from 0.65 to 0.77, pushing the performance to a new level.</description><subject>Annotations</subject><subject>boundary detection</subject><subject>Cameras</subject><subject>cinematic style</subject><subject>Composition</subject><subject>Convolution</subject><subject>Feature extraction</subject><subject>Learning</subject><subject>Motion pictures</subject><subject>Shot type</subject><subject>Video</subject><subject>Videos</subject><subject>Viewing</subject><subject>Visualization</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kEtLAzEUhYMoWB97wU3A9dTcPGaSZa1vWlxYBVchncnYlDapSWbRf-_UFlf3wjnn3sOH0BWQIQBRt7PpdEgJhSEjigJnR2gAikNBSFUd97ugpOgFcorOUloSAlyQaoC-XoPzebXFE2uid_4b54XFo5yjm3fZJmx8g8dhvQnJZRc8Di1-X4SccBsivgudb0zc4nubbf2nO48_XWNDukAnrVkle3mY5-jj8WE2fi4mb08v49GkqKmCXDQCTAvzWoFSVNaVkq3krbCUCSkNn4M1nFaGK1WVQnDJqGigYSWVjQLBGTtHN_u7mxh-OpuyXoYu-v6lpmUlCGcgoXeRvauOIaVoW72Jbt1X10D0DqDuAeodQH0A2Eeu9xFnrf23K17yspTsF_lfatw</recordid><startdate>20220101</startdate><enddate>20220101</enddate><creator>Jiang, Xuekun</creator><creator>Jin, Libiao</creator><creator>Rao, Anyi</creator><creator>Xu, Linning</creator><creator>Lin, Dahua</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-4530-2996</orcidid><orcidid>https://orcid.org/0000-0001-6441-4366</orcidid><orcidid>https://orcid.org/0000-0003-1004-7753</orcidid></search><sort><creationdate>20220101</creationdate><title>Jointly Learning the Attributes and Composition of Shots for Boundary Detection in Videos</title><author>Jiang, Xuekun ; Jin, Libiao ; Rao, Anyi ; Xu, Linning ; Lin, Dahua</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c291t-d51af1bc919928c798f84f5e23588a4b1ea427a499765548325d1d3628d915433</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Annotations</topic><topic>boundary detection</topic><topic>Cameras</topic><topic>cinematic style</topic><topic>Composition</topic><topic>Convolution</topic><topic>Feature extraction</topic><topic>Learning</topic><topic>Motion pictures</topic><topic>Shot type</topic><topic>Video</topic><topic>Videos</topic><topic>Viewing</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jiang, Xuekun</creatorcontrib><creatorcontrib>Jin, Libiao</creatorcontrib><creatorcontrib>Rao, Anyi</creatorcontrib><creatorcontrib>Xu, Linning</creatorcontrib><creatorcontrib>Lin, Dahua</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jiang, Xuekun</au><au>Jin, Libiao</au><au>Rao, Anyi</au><au>Xu, Linning</au><au>Lin, Dahua</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Jointly Learning the Attributes and Composition of Shots for Boundary Detection in Videos</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2022-01-01</date><risdate>2022</risdate><volume>24</volume><spage>3049</spage><epage>3059</epage><pages>3049-3059</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>In film making, shot has a profound influence on how the movie content is delivered and how the audiences are echoed, where different emotions and contents can be delivered through well-designed camera movements or shot editing. Therefore, in pursuit of high-level understanding of long videos, accurate shot detection from untrimmed videos should be considered as the first and the most fundamental step. Existing approaches address this problem based on the visual differences and content transitions between consecutive frames, while ignoring intrinsic shot attributes, viz. , camera movements, scales, and viewing angles, which essentially reveal how each shot is created. In this work, we propose a new learning framework (SCTSNet) for shot boundary detection by jointly recognizing the attributes and composition of shots in videos. To facilitate the analysis of shots and the evaluation of shot detection models, we collect a large-scale shot boundary dataset MovieShots2 , which contains &lt;inline-formula&gt;&lt;tex-math notation="LaTeX"&gt;\text{15}\,K&lt;/tex-math&gt;&lt;/inline-formula&gt; shots from 282 movie clips. It is richly annotated with the temporal boundary between consecutive shots and individual shot attributes, including camera movements, scales, and viewing angles, which are the three most distinct shot attributes. Our experiments show that the joint learning framework can significantly boost the boundary detection performance, surpassing the previous scores by a large margin. SCTSNet improves shot boundary detection AP from 0.65 to 0.77, pushing the performance to a new level.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2021.3092143</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0003-4530-2996</orcidid><orcidid>https://orcid.org/0000-0001-6441-4366</orcidid><orcidid>https://orcid.org/0000-0003-1004-7753</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1520-9210
ispartof IEEE transactions on multimedia, 2022-01, Vol.24, p.3049-3059
issn 1520-9210
1941-0077
language eng
recordid cdi_ieee_primary_9464668
source IEEE Electronic Library (IEL)
subjects Annotations
boundary detection
Cameras
cinematic style
Composition
Convolution
Feature extraction
Learning
Motion pictures
Shot type
Video
Videos
Viewing
Visualization
title Jointly Learning the Attributes and Composition of Shots for Boundary Detection in Videos
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T22%3A48%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Jointly%20Learning%20the%20Attributes%20and%20Composition%20of%20Shots%20for%20Boundary%20Detection%20in%20Videos&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Jiang,%20Xuekun&rft.date=2022-01-01&rft.volume=24&rft.spage=3049&rft.epage=3059&rft.pages=3049-3059&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2021.3092143&rft_dat=%3Cproquest_RIE%3E2675043181%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2675043181&rft_id=info:pmid/&rft_ieee_id=9464668&rfr_iscdi=true