Affect in Multimedia: Benchmarking Violent Scenes Detection

In this article, we report on the creation of a publicly available, common evaluation framework for Violent Scenes Detection (VSD) in Hollywood and YouTube videos. We propose a robust data set, the VSD96, with more than 96 hours of video of various genres, annotations at different levels of detail (...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on affective computing 2022-01, Vol.13 (1), p.347-366
Hauptverfasser: Constantin, Mihai Gabriel, Stefan, Liviu-Daniel, Ionescu, Bogdan, Demarty, Claire-Helene, Sjoberg, Mats, Schedl, Markus, Gravier, Guillaume
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 366
container_issue 1
container_start_page 347
container_title IEEE transactions on affective computing
container_volume 13
creator Constantin, Mihai Gabriel
Stefan, Liviu-Daniel
Ionescu, Bogdan
Demarty, Claire-Helene
Sjoberg, Mats
Schedl, Markus
Gravier, Guillaume
description In this article, we report on the creation of a publicly available, common evaluation framework for Violent Scenes Detection (VSD) in Hollywood and YouTube videos. We propose a robust data set, the VSD96, with more than 96 hours of video of various genres, annotations at different levels of detail (e.g., shot-level, segment-level), annotations of mid-level concepts (e.g., blood, fire), various pre-computed multi-modal descriptors, and over 230 system output results as baselines. This is the most comprehensive data set available to this date tailored to the VSD task and was extensively validated during the MediaEval benchmarking campaigns. Furthermore, we provide an in-depth analysis of the crucial components of VSD algorithms, by reviewing the capabilities and the evolution of existing systems (e.g., overall trends and outliers, the influence of the employed features and fusion techniques, the influence of deep learning approaches). Finally, we discuss the possibility of going beyond state-of-the-art performance via an ad-hoc late fusion approach. Experimentation is carried out on the VSD96 data. We provide the most important lessons learned and gained insights. The increasing number of publications using the VSD96 data underline the importance of the topic. The presented and published resources are a practitioner's guide and also a strong baseline to overcome, which will help researchers for the coming years in analyzing aspects of audio-visual affect and violence detection in movies and videos.
doi_str_mv 10.1109/TAFFC.2020.2986969
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2635044327</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9064936</ieee_id><sourcerecordid>2635044327</sourcerecordid><originalsourceid>FETCH-LOGICAL-c373t-b9325d30682a5fc261012298d2a40d732a6044ecfec229b93c53cf0a045332313</originalsourceid><addsrcrecordid>eNpNkEFLwzAYhoMoOOb-gF4Knjx0fsnXpoue6nROmHhweg0xTV1ml86mE_z3plaGuSR8PO-Xh5eQUwpjSkFcLvPZbDpmwGDMxIQLLg7IgIpExAhJevjvfUxG3q8hHETkLBuQ67wsjW4j66LHXdXajSmsuopujNOrjWo-rHuPXm1dGddGz9o446Nb04aErd0JOSpV5c3o7x6Sl9ndcjqPF0_3D9N8EWvMsI3fBLK0QOATptJSM06BsiBaMJVAkSFTHJLE6OARxoHWKeoSVPBFZEhxSC76vStVyW1jg9e3rJWV83whuxkgIMvo5Ktjz3t229SfO-Nbua53jQt6knFMw0eBDBTrKd3U3jem3K-lILtO5W-nsutU_nUaQmd9yBpj9gEBPBHI8Qff527r</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2635044327</pqid></control><display><type>article</type><title>Affect in Multimedia: Benchmarking Violent Scenes Detection</title><source>IEEE Electronic Library (IEL)</source><creator>Constantin, Mihai Gabriel ; Stefan, Liviu-Daniel ; Ionescu, Bogdan ; Demarty, Claire-Helene ; Sjoberg, Mats ; Schedl, Markus ; Gravier, Guillaume</creator><creatorcontrib>Constantin, Mihai Gabriel ; Stefan, Liviu-Daniel ; Ionescu, Bogdan ; Demarty, Claire-Helene ; Sjoberg, Mats ; Schedl, Markus ; Gravier, Guillaume</creatorcontrib><description>In this article, we report on the creation of a publicly available, common evaluation framework for Violent Scenes Detection (VSD) in Hollywood and YouTube videos. We propose a robust data set, the VSD96, with more than 96 hours of video of various genres, annotations at different levels of detail (e.g., shot-level, segment-level), annotations of mid-level concepts (e.g., blood, fire), various pre-computed multi-modal descriptors, and over 230 system output results as baselines. This is the most comprehensive data set available to this date tailored to the VSD task and was extensively validated during the MediaEval benchmarking campaigns. Furthermore, we provide an in-depth analysis of the crucial components of VSD algorithms, by reviewing the capabilities and the evolution of existing systems (e.g., overall trends and outliers, the influence of the employed features and fusion techniques, the influence of deep learning approaches). Finally, we discuss the possibility of going beyond state-of-the-art performance via an ad-hoc late fusion approach. Experimentation is carried out on the VSD96 data. We provide the most important lessons learned and gained insights. The increasing number of publications using the VSD96 data underline the importance of the topic. The presented and published resources are a practitioner's guide and also a strong baseline to overcome, which will help researchers for the coming years in analyzing aspects of audio-visual affect and violence detection in movies and videos.</description><identifier>ISSN: 1949-3045</identifier><identifier>EISSN: 1949-3045</identifier><identifier>DOI: 10.1109/TAFFC.2020.2986969</identifier><identifier>CODEN: ITACBQ</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Annotations ; Benchmark testing ; benchmarking ; Benchmarks ; Computer Science ; Computer Vision and Pattern Recognition ; Datasets ; Experimentation ; literature review ; Machine Learning ; Motion pictures ; multi-modal content description ; Multimedia ; Outliers (statistics) ; Sound ; Task analysis ; Video ; Videos ; Violent scenes detection ; Visual aspects ; VSD96 data set ; YouTube</subject><ispartof>IEEE transactions on affective computing, 2022-01, Vol.13 (1), p.347-366</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c373t-b9325d30682a5fc261012298d2a40d732a6044ecfec229b93c53cf0a045332313</citedby><cites>FETCH-LOGICAL-c373t-b9325d30682a5fc261012298d2a40d732a6044ecfec229b93c53cf0a045332313</cites><orcidid>0000-0001-9174-3923 ; 0000-0002-2312-6672 ; 0000-0002-3157-7668 ; 0000-0001-6549-584X ; 0000-0003-1706-3406 ; 0000-0002-2266-5682 ; 0000-0003-4112-5769</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9064936$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>230,314,780,784,796,885,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9064936$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://hal.science/hal-03032718$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Constantin, Mihai Gabriel</creatorcontrib><creatorcontrib>Stefan, Liviu-Daniel</creatorcontrib><creatorcontrib>Ionescu, Bogdan</creatorcontrib><creatorcontrib>Demarty, Claire-Helene</creatorcontrib><creatorcontrib>Sjoberg, Mats</creatorcontrib><creatorcontrib>Schedl, Markus</creatorcontrib><creatorcontrib>Gravier, Guillaume</creatorcontrib><title>Affect in Multimedia: Benchmarking Violent Scenes Detection</title><title>IEEE transactions on affective computing</title><addtitle>TAFFC</addtitle><description>In this article, we report on the creation of a publicly available, common evaluation framework for Violent Scenes Detection (VSD) in Hollywood and YouTube videos. We propose a robust data set, the VSD96, with more than 96 hours of video of various genres, annotations at different levels of detail (e.g., shot-level, segment-level), annotations of mid-level concepts (e.g., blood, fire), various pre-computed multi-modal descriptors, and over 230 system output results as baselines. This is the most comprehensive data set available to this date tailored to the VSD task and was extensively validated during the MediaEval benchmarking campaigns. Furthermore, we provide an in-depth analysis of the crucial components of VSD algorithms, by reviewing the capabilities and the evolution of existing systems (e.g., overall trends and outliers, the influence of the employed features and fusion techniques, the influence of deep learning approaches). Finally, we discuss the possibility of going beyond state-of-the-art performance via an ad-hoc late fusion approach. Experimentation is carried out on the VSD96 data. We provide the most important lessons learned and gained insights. The increasing number of publications using the VSD96 data underline the importance of the topic. The presented and published resources are a practitioner's guide and also a strong baseline to overcome, which will help researchers for the coming years in analyzing aspects of audio-visual affect and violence detection in movies and videos.</description><subject>Algorithms</subject><subject>Annotations</subject><subject>Benchmark testing</subject><subject>benchmarking</subject><subject>Benchmarks</subject><subject>Computer Science</subject><subject>Computer Vision and Pattern Recognition</subject><subject>Datasets</subject><subject>Experimentation</subject><subject>literature review</subject><subject>Machine Learning</subject><subject>Motion pictures</subject><subject>multi-modal content description</subject><subject>Multimedia</subject><subject>Outliers (statistics)</subject><subject>Sound</subject><subject>Task analysis</subject><subject>Video</subject><subject>Videos</subject><subject>Violent scenes detection</subject><subject>Visual aspects</subject><subject>VSD96 data set</subject><subject>YouTube</subject><issn>1949-3045</issn><issn>1949-3045</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkEFLwzAYhoMoOOb-gF4Knjx0fsnXpoue6nROmHhweg0xTV1ml86mE_z3plaGuSR8PO-Xh5eQUwpjSkFcLvPZbDpmwGDMxIQLLg7IgIpExAhJevjvfUxG3q8hHETkLBuQ67wsjW4j66LHXdXajSmsuopujNOrjWo-rHuPXm1dGddGz9o446Nb04aErd0JOSpV5c3o7x6Sl9ndcjqPF0_3D9N8EWvMsI3fBLK0QOATptJSM06BsiBaMJVAkSFTHJLE6OARxoHWKeoSVPBFZEhxSC76vStVyW1jg9e3rJWV83whuxkgIMvo5Ktjz3t229SfO-Nbua53jQt6knFMw0eBDBTrKd3U3jem3K-lILtO5W-nsutU_nUaQmd9yBpj9gEBPBHI8Qff527r</recordid><startdate>20220101</startdate><enddate>20220101</enddate><creator>Constantin, Mihai Gabriel</creator><creator>Stefan, Liviu-Daniel</creator><creator>Ionescu, Bogdan</creator><creator>Demarty, Claire-Helene</creator><creator>Sjoberg, Mats</creator><creator>Schedl, Markus</creator><creator>Gravier, Guillaume</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><general>Institute of Electrical and Electronics Engineers</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0001-9174-3923</orcidid><orcidid>https://orcid.org/0000-0002-2312-6672</orcidid><orcidid>https://orcid.org/0000-0002-3157-7668</orcidid><orcidid>https://orcid.org/0000-0001-6549-584X</orcidid><orcidid>https://orcid.org/0000-0003-1706-3406</orcidid><orcidid>https://orcid.org/0000-0002-2266-5682</orcidid><orcidid>https://orcid.org/0000-0003-4112-5769</orcidid></search><sort><creationdate>20220101</creationdate><title>Affect in Multimedia: Benchmarking Violent Scenes Detection</title><author>Constantin, Mihai Gabriel ; Stefan, Liviu-Daniel ; Ionescu, Bogdan ; Demarty, Claire-Helene ; Sjoberg, Mats ; Schedl, Markus ; Gravier, Guillaume</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c373t-b9325d30682a5fc261012298d2a40d732a6044ecfec229b93c53cf0a045332313</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Annotations</topic><topic>Benchmark testing</topic><topic>benchmarking</topic><topic>Benchmarks</topic><topic>Computer Science</topic><topic>Computer Vision and Pattern Recognition</topic><topic>Datasets</topic><topic>Experimentation</topic><topic>literature review</topic><topic>Machine Learning</topic><topic>Motion pictures</topic><topic>multi-modal content description</topic><topic>Multimedia</topic><topic>Outliers (statistics)</topic><topic>Sound</topic><topic>Task analysis</topic><topic>Video</topic><topic>Videos</topic><topic>Violent scenes detection</topic><topic>Visual aspects</topic><topic>VSD96 data set</topic><topic>YouTube</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Constantin, Mihai Gabriel</creatorcontrib><creatorcontrib>Stefan, Liviu-Daniel</creatorcontrib><creatorcontrib>Ionescu, Bogdan</creatorcontrib><creatorcontrib>Demarty, Claire-Helene</creatorcontrib><creatorcontrib>Sjoberg, Mats</creatorcontrib><creatorcontrib>Schedl, Markus</creatorcontrib><creatorcontrib>Gravier, Guillaume</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><jtitle>IEEE transactions on affective computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Constantin, Mihai Gabriel</au><au>Stefan, Liviu-Daniel</au><au>Ionescu, Bogdan</au><au>Demarty, Claire-Helene</au><au>Sjoberg, Mats</au><au>Schedl, Markus</au><au>Gravier, Guillaume</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Affect in Multimedia: Benchmarking Violent Scenes Detection</atitle><jtitle>IEEE transactions on affective computing</jtitle><stitle>TAFFC</stitle><date>2022-01-01</date><risdate>2022</risdate><volume>13</volume><issue>1</issue><spage>347</spage><epage>366</epage><pages>347-366</pages><issn>1949-3045</issn><eissn>1949-3045</eissn><coden>ITACBQ</coden><abstract>In this article, we report on the creation of a publicly available, common evaluation framework for Violent Scenes Detection (VSD) in Hollywood and YouTube videos. We propose a robust data set, the VSD96, with more than 96 hours of video of various genres, annotations at different levels of detail (e.g., shot-level, segment-level), annotations of mid-level concepts (e.g., blood, fire), various pre-computed multi-modal descriptors, and over 230 system output results as baselines. This is the most comprehensive data set available to this date tailored to the VSD task and was extensively validated during the MediaEval benchmarking campaigns. Furthermore, we provide an in-depth analysis of the crucial components of VSD algorithms, by reviewing the capabilities and the evolution of existing systems (e.g., overall trends and outliers, the influence of the employed features and fusion techniques, the influence of deep learning approaches). Finally, we discuss the possibility of going beyond state-of-the-art performance via an ad-hoc late fusion approach. Experimentation is carried out on the VSD96 data. We provide the most important lessons learned and gained insights. The increasing number of publications using the VSD96 data underline the importance of the topic. The presented and published resources are a practitioner's guide and also a strong baseline to overcome, which will help researchers for the coming years in analyzing aspects of audio-visual affect and violence detection in movies and videos.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TAFFC.2020.2986969</doi><tpages>20</tpages><orcidid>https://orcid.org/0000-0001-9174-3923</orcidid><orcidid>https://orcid.org/0000-0002-2312-6672</orcidid><orcidid>https://orcid.org/0000-0002-3157-7668</orcidid><orcidid>https://orcid.org/0000-0001-6549-584X</orcidid><orcidid>https://orcid.org/0000-0003-1706-3406</orcidid><orcidid>https://orcid.org/0000-0002-2266-5682</orcidid><orcidid>https://orcid.org/0000-0003-4112-5769</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1949-3045
ispartof IEEE transactions on affective computing, 2022-01, Vol.13 (1), p.347-366
issn 1949-3045
1949-3045
language eng
recordid cdi_proquest_journals_2635044327
source IEEE Electronic Library (IEL)
subjects Algorithms
Annotations
Benchmark testing
benchmarking
Benchmarks
Computer Science
Computer Vision and Pattern Recognition
Datasets
Experimentation
literature review
Machine Learning
Motion pictures
multi-modal content description
Multimedia
Outliers (statistics)
Sound
Task analysis
Video
Videos
Violent scenes detection
Visual aspects
VSD96 data set
YouTube
title Affect in Multimedia: Benchmarking Violent Scenes Detection
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T23%3A11%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Affect%20in%20Multimedia:%20Benchmarking%20Violent%20Scenes%20Detection&rft.jtitle=IEEE%20transactions%20on%20affective%20computing&rft.au=Constantin,%20Mihai%20Gabriel&rft.date=2022-01-01&rft.volume=13&rft.issue=1&rft.spage=347&rft.epage=366&rft.pages=347-366&rft.issn=1949-3045&rft.eissn=1949-3045&rft.coden=ITACBQ&rft_id=info:doi/10.1109/TAFFC.2020.2986969&rft_dat=%3Cproquest_RIE%3E2635044327%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2635044327&rft_id=info:pmid/&rft_ieee_id=9064936&rfr_iscdi=true