A Unified Framework for Shot Type Classification Based on Subject Centric Lens

Shots are key narrative elements of various videos, e.g. movies, TV series, and user-generated videos that are thriving over the Internet. The types of shots greatly influence how the underlying ideas, emotions, and messages are expressed. The technique to analyze shot types is important to the unde...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Rao, Anyi, Wang, Jiaze, Xu, Linning, Jiang, Xuekun, Huang, Qingqiu, Zhou, Bolei, Lin, Dahua
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning Computer Science - Multimedia
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Rao, Anyi Wang, Jiaze Xu, Linning Jiang, Xuekun Huang, Qingqiu Zhou, Bolei Lin, Dahua
description	Shots are key narrative elements of various videos, e.g. movies, TV series, and user-generated videos that are thriving over the Internet. The types of shots greatly influence how the underlying ideas, emotions, and messages are expressed. The technique to analyze shot types is important to the understanding of videos, which has seen increasing demand in real-world applications in this era. Classifying shot type is challenging due to the additional information required beyond the video content, such as the spatial composition of a frame and camera movement. To address these issues, we propose a learning framework Subject Guidance Network (SGNet) for shot type recognition. SGNet separates the subject and background of a shot into two streams, serving as separate guidance maps for scale and movement type classification respectively. To facilitate shot type analysis and model evaluations, we build a large-scale dataset MovieShots, which contains 46K shots from 7K movie trailers with annotations of their scale and movement types. Experiments show that our framework is able to recognize these two attributes of shot accurately, outperforming all the previous methods.
doi_str_mv	10.48550/arxiv.2008.03548
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2008_03548</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2008_03548</sourcerecordid><originalsourceid>FETCH-LOGICAL-a678-43e330e0b2af0e85f8e3ff4720e5890101f1f166bbc5b9f6ae197eefb0e357243</originalsourceid><addsrcrecordid>eNotj81OwzAQhH3hgAoPwAm_QMImthPnWCIKSFE5NJyjdbqrGtqkssNP355Q0BxmDp9G-oS4ySDV1hi4w_DtP9McwKagjLaXYr2Ur4NnT1u5CnigrzG8Sx6D3OzGSbanI8l6jzHOSI-THwd5j3GG57H5cG_UT7KmYQq-lw0N8UpcMO4jXf_3QrSrh7Z-SpqXx-d62SRYlDbRipQCApcjA1nDlhSzLnMgYyvIIOM5ReFcb1zFBVJWlUTsgJQpc60W4vbv9izUHYM_YDh1v2LdWUz9AAm5SC8</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Unified Framework for Shot Type Classification Based on Subject Centric Lens</title><source>arXiv.org</source><creator>Rao, Anyi ; Wang, Jiaze ; Xu, Linning ; Jiang, Xuekun ; Huang, Qingqiu ; Zhou, Bolei ; Lin, Dahua</creator><creatorcontrib>Rao, Anyi ; Wang, Jiaze ; Xu, Linning ; Jiang, Xuekun ; Huang, Qingqiu ; Zhou, Bolei ; Lin, Dahua</creatorcontrib><description>Shots are key narrative elements of various videos, e.g. movies, TV series, and user-generated videos that are thriving over the Internet. The types of shots greatly influence how the underlying ideas, emotions, and messages are expressed. The technique to analyze shot types is important to the understanding of videos, which has seen increasing demand in real-world applications in this era. Classifying shot type is challenging due to the additional information required beyond the video content, such as the spatial composition of a frame and camera movement. To address these issues, we propose a learning framework Subject Guidance Network (SGNet) for shot type recognition. SGNet separates the subject and background of a shot into two streams, serving as separate guidance maps for scale and movement type classification respectively. To facilitate shot type analysis and model evaluations, we build a large-scale dataset MovieShots, which contains 46K shots from 7K movie trailers with annotations of their scale and movement types. Experiments show that our framework is able to recognize these two attributes of shot accurately, outperforming all the previous methods.</description><identifier>DOI: 10.48550/arxiv.2008.03548</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning ; Computer Science - Multimedia</subject><creationdate>2020-08</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2008.03548$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2008.03548$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Rao, Anyi</creatorcontrib><creatorcontrib>Wang, Jiaze</creatorcontrib><creatorcontrib>Xu, Linning</creatorcontrib><creatorcontrib>Jiang, Xuekun</creatorcontrib><creatorcontrib>Huang, Qingqiu</creatorcontrib><creatorcontrib>Zhou, Bolei</creatorcontrib><creatorcontrib>Lin, Dahua</creatorcontrib><title>A Unified Framework for Shot Type Classification Based on Subject Centric Lens</title><description>Shots are key narrative elements of various videos, e.g. movies, TV series, and user-generated videos that are thriving over the Internet. The types of shots greatly influence how the underlying ideas, emotions, and messages are expressed. The technique to analyze shot types is important to the understanding of videos, which has seen increasing demand in real-world applications in this era. Classifying shot type is challenging due to the additional information required beyond the video content, such as the spatial composition of a frame and camera movement. To address these issues, we propose a learning framework Subject Guidance Network (SGNet) for shot type recognition. SGNet separates the subject and background of a shot into two streams, serving as separate guidance maps for scale and movement type classification respectively. To facilitate shot type analysis and model evaluations, we build a large-scale dataset MovieShots, which contains 46K shots from 7K movie trailers with annotations of their scale and movement types. Experiments show that our framework is able to recognize these two attributes of shot accurately, outperforming all the previous methods.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><subject>Computer Science - Multimedia</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj81OwzAQhH3hgAoPwAm_QMImthPnWCIKSFE5NJyjdbqrGtqkssNP355Q0BxmDp9G-oS4ySDV1hi4w_DtP9McwKagjLaXYr2Ur4NnT1u5CnigrzG8Sx6D3OzGSbanI8l6jzHOSI-THwd5j3GG57H5cG_UT7KmYQq-lw0N8UpcMO4jXf_3QrSrh7Z-SpqXx-d62SRYlDbRipQCApcjA1nDlhSzLnMgYyvIIOM5ReFcb1zFBVJWlUTsgJQpc60W4vbv9izUHYM_YDh1v2LdWUz9AAm5SC8</recordid><startdate>20200808</startdate><enddate>20200808</enddate><creator>Rao, Anyi</creator><creator>Wang, Jiaze</creator><creator>Xu, Linning</creator><creator>Jiang, Xuekun</creator><creator>Huang, Qingqiu</creator><creator>Zhou, Bolei</creator><creator>Lin, Dahua</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20200808</creationdate><title>A Unified Framework for Shot Type Classification Based on Subject Centric Lens</title><author>Rao, Anyi ; Wang, Jiaze ; Xu, Linning ; Jiang, Xuekun ; Huang, Qingqiu ; Zhou, Bolei ; Lin, Dahua</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a678-43e330e0b2af0e85f8e3ff4720e5890101f1f166bbc5b9f6ae197eefb0e357243</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><topic>Computer Science - Multimedia</topic><toplevel>online_resources</toplevel><creatorcontrib>Rao, Anyi</creatorcontrib><creatorcontrib>Wang, Jiaze</creatorcontrib><creatorcontrib>Xu, Linning</creatorcontrib><creatorcontrib>Jiang, Xuekun</creatorcontrib><creatorcontrib>Huang, Qingqiu</creatorcontrib><creatorcontrib>Zhou, Bolei</creatorcontrib><creatorcontrib>Lin, Dahua</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Rao, Anyi</au><au>Wang, Jiaze</au><au>Xu, Linning</au><au>Jiang, Xuekun</au><au>Huang, Qingqiu</au><au>Zhou, Bolei</au><au>Lin, Dahua</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Unified Framework for Shot Type Classification Based on Subject Centric Lens</atitle><date>2020-08-08</date><risdate>2020</risdate><abstract>Shots are key narrative elements of various videos, e.g. movies, TV series, and user-generated videos that are thriving over the Internet. The types of shots greatly influence how the underlying ideas, emotions, and messages are expressed. The technique to analyze shot types is important to the understanding of videos, which has seen increasing demand in real-world applications in this era. Classifying shot type is challenging due to the additional information required beyond the video content, such as the spatial composition of a frame and camera movement. To address these issues, we propose a learning framework Subject Guidance Network (SGNet) for shot type recognition. SGNet separates the subject and background of a shot into two streams, serving as separate guidance maps for scale and movement type classification respectively. To facilitate shot type analysis and model evaluations, we build a large-scale dataset MovieShots, which contains 46K shots from 7K movie trailers with annotations of their scale and movement types. Experiments show that our framework is able to recognize these two attributes of shot accurately, outperforming all the previous methods.</abstract><doi>10.48550/arxiv.2008.03548</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2008.03548
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2008_03548
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning Computer Science - Multimedia
title	A Unified Framework for Shot Type Classification Based on Subject Centric Lens
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T12%3A49%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Unified%20Framework%20for%20Shot%20Type%20Classification%20Based%20on%20Subject%20Centric%20Lens&rft.au=Rao,%20Anyi&rft.date=2020-08-08&rft_id=info:doi/10.48550/arxiv.2008.03548&rft_dat=%3Carxiv_GOX%3E2008_03548%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true