Applying multisensor in-car situations to detect violence

Violence recognition is challenging because it can be presented in very different forms. For example, it can be present in an image by a person hitting another person or present in audio by a person being rude to another. Thus, audio and video are essential features to be analysed. In the audio appr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems 2025-01, Vol.42 (1), p.n/a
Hauptverfasser: Durães, Dalila, Santos, Flavio, Marcondes, Francisco Supino, Hammerschmidt, Niklas, Novais, Paulo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page n/a
container_issue 1
container_start_page
container_title Expert systems
container_volume 42
creator Durães, Dalila
Santos, Flavio
Marcondes, Francisco Supino
Hammerschmidt, Niklas
Novais, Paulo
description Violence recognition is challenging because it can be presented in very different forms. For example, it can be present in an image by a person hitting another person or present in audio by a person being rude to another. Thus, audio and video are essential features to be analysed. In the audio approach, speech processing, music, and ambient sound are some of the main points of this problem since finding similarities and differences between these domains is necessary. Human activity can be classified into four different categories in the video approach, depending on the complexity and the number of body parts involved in the action. Examples of Human activity categories are considered: gestures, actions, interactions and activities. Recognizing human actions in the video becomes a challenge with this varied set of human activities. Furthermore, in the last years, the growth of deep learning techniques applied to this area has been enormous, and the reason is that their results surpass traditional signal processing on a large scale. This article is based on audio and video signals inside a vehicle to detect violence. Furthermore, the architecture used was ResNet model with Mel-spectrogram methodology for audio signals. The proposed method for video signal representation was RGB, which applied four different models: C2D, I3D, X3D, and Flow-Gated. Finally, multimodal fusion was applied at the end of the process. FCT - Fundação para a Ciência e a Tecnologia(039334)
doi_str_mv 10.1111/exsy.13356
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3145850042</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3145850042</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2856-a975760b440c837923c2ce1bf960f8af4218a72cb269aac9c3dd0ab6fa0efa5a3</originalsourceid><addsrcrecordid>eNp9kEtLw0AUhQdRsFY3_oKAOyH1ziOTzLKU-oCCCxV0NdxMJzIlTeLMpJp_b2p1692czXfOhY-QSwozOt6N_QrDjHKeySMyoUIWKXAljskEmJSpyBmckrMQNgBA81xOiJp3XT245j3Z9nV0wTah9YlrUoM-CS72GF3bhCS2ydpGa2Kyc21tG2PPyUmFdbAXvzklL7fL58V9unq8e1jMV6lhRSZTVHmWSyiFAFPwXDFumLG0rJSEqsBKMFpgzkzJpEI0yvD1GrCUFYKtMEM-JVeH3c63H70NUW_a3jfjS82pyIoMQLCRuj5QxrcheFvpzrst-kFT0Hs1eq9G_6gZYXqAP11th39IvXx9evvrJIeON4id9nbnQsSgacGYLpSiwL8BwB9xXA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3145850042</pqid></control><display><type>article</type><title>Applying multisensor in-car situations to detect violence</title><source>Access via Wiley Online Library</source><creator>Durães, Dalila ; Santos, Flavio ; Marcondes, Francisco Supino ; Hammerschmidt, Niklas ; Novais, Paulo</creator><creatorcontrib>Durães, Dalila ; Santos, Flavio ; Marcondes, Francisco Supino ; Hammerschmidt, Niklas ; Novais, Paulo</creatorcontrib><description>Violence recognition is challenging because it can be presented in very different forms. For example, it can be present in an image by a person hitting another person or present in audio by a person being rude to another. Thus, audio and video are essential features to be analysed. In the audio approach, speech processing, music, and ambient sound are some of the main points of this problem since finding similarities and differences between these domains is necessary. Human activity can be classified into four different categories in the video approach, depending on the complexity and the number of body parts involved in the action. Examples of Human activity categories are considered: gestures, actions, interactions and activities. Recognizing human actions in the video becomes a challenge with this varied set of human activities. Furthermore, in the last years, the growth of deep learning techniques applied to this area has been enormous, and the reason is that their results surpass traditional signal processing on a large scale. This article is based on audio and video signals inside a vehicle to detect violence. Furthermore, the architecture used was ResNet model with Mel-spectrogram methodology for audio signals. The proposed method for video signal representation was RGB, which applied four different models: C2D, I3D, X3D, and Flow-Gated. Finally, multimodal fusion was applied at the end of the process. FCT - Fundação para a Ciência e a Tecnologia(039334)</description><identifier>ISSN: 0266-4720</identifier><identifier>EISSN: 1468-0394</identifier><identifier>DOI: 10.1111/exsy.13356</identifier><language>eng</language><publisher>Oxford: Wiley</publisher><subject>Audio ; Audio signals ; Body parts ; in‐vehicle ; Multimodal fusion ; Signal processing ; Speech processing ; Video ; Video signals ; Violence ; Violence detection</subject><ispartof>Expert systems, 2025-01, Vol.42 (1), p.n/a</ispartof><rights>2023 John Wiley &amp; Sons Ltd.</rights><rights>2025 John Wiley &amp; Sons, Ltd.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c2856-a975760b440c837923c2ce1bf960f8af4218a72cb269aac9c3dd0ab6fa0efa5a3</cites><orcidid>0000-0002-2221-2261</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1111%2Fexsy.13356$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1111%2Fexsy.13356$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,780,784,1417,27924,27925,45574,45575</link.rule.ids></links><search><creatorcontrib>Durães, Dalila</creatorcontrib><creatorcontrib>Santos, Flavio</creatorcontrib><creatorcontrib>Marcondes, Francisco Supino</creatorcontrib><creatorcontrib>Hammerschmidt, Niklas</creatorcontrib><creatorcontrib>Novais, Paulo</creatorcontrib><title>Applying multisensor in-car situations to detect violence</title><title>Expert systems</title><description>Violence recognition is challenging because it can be presented in very different forms. For example, it can be present in an image by a person hitting another person or present in audio by a person being rude to another. Thus, audio and video are essential features to be analysed. In the audio approach, speech processing, music, and ambient sound are some of the main points of this problem since finding similarities and differences between these domains is necessary. Human activity can be classified into four different categories in the video approach, depending on the complexity and the number of body parts involved in the action. Examples of Human activity categories are considered: gestures, actions, interactions and activities. Recognizing human actions in the video becomes a challenge with this varied set of human activities. Furthermore, in the last years, the growth of deep learning techniques applied to this area has been enormous, and the reason is that their results surpass traditional signal processing on a large scale. This article is based on audio and video signals inside a vehicle to detect violence. Furthermore, the architecture used was ResNet model with Mel-spectrogram methodology for audio signals. The proposed method for video signal representation was RGB, which applied four different models: C2D, I3D, X3D, and Flow-Gated. Finally, multimodal fusion was applied at the end of the process. FCT - Fundação para a Ciência e a Tecnologia(039334)</description><subject>Audio</subject><subject>Audio signals</subject><subject>Body parts</subject><subject>in‐vehicle</subject><subject>Multimodal fusion</subject><subject>Signal processing</subject><subject>Speech processing</subject><subject>Video</subject><subject>Video signals</subject><subject>Violence</subject><subject>Violence detection</subject><issn>0266-4720</issn><issn>1468-0394</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><recordid>eNp9kEtLw0AUhQdRsFY3_oKAOyH1ziOTzLKU-oCCCxV0NdxMJzIlTeLMpJp_b2p1692czXfOhY-QSwozOt6N_QrDjHKeySMyoUIWKXAljskEmJSpyBmckrMQNgBA81xOiJp3XT245j3Z9nV0wTah9YlrUoM-CS72GF3bhCS2ydpGa2Kyc21tG2PPyUmFdbAXvzklL7fL58V9unq8e1jMV6lhRSZTVHmWSyiFAFPwXDFumLG0rJSEqsBKMFpgzkzJpEI0yvD1GrCUFYKtMEM-JVeH3c63H70NUW_a3jfjS82pyIoMQLCRuj5QxrcheFvpzrst-kFT0Hs1eq9G_6gZYXqAP11th39IvXx9evvrJIeON4id9nbnQsSgacGYLpSiwL8BwB9xXA</recordid><startdate>202501</startdate><enddate>202501</enddate><creator>Durães, Dalila</creator><creator>Santos, Flavio</creator><creator>Marcondes, Francisco Supino</creator><creator>Hammerschmidt, Niklas</creator><creator>Novais, Paulo</creator><general>Wiley</general><general>Blackwell Publishing Ltd</general><scope>RCLKO</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7TB</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-2221-2261</orcidid></search><sort><creationdate>202501</creationdate><title>Applying multisensor in-car situations to detect violence</title><author>Durães, Dalila ; Santos, Flavio ; Marcondes, Francisco Supino ; Hammerschmidt, Niklas ; Novais, Paulo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2856-a975760b440c837923c2ce1bf960f8af4218a72cb269aac9c3dd0ab6fa0efa5a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><topic>Audio</topic><topic>Audio signals</topic><topic>Body parts</topic><topic>in‐vehicle</topic><topic>Multimodal fusion</topic><topic>Signal processing</topic><topic>Speech processing</topic><topic>Video</topic><topic>Video signals</topic><topic>Violence</topic><topic>Violence detection</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Durães, Dalila</creatorcontrib><creatorcontrib>Santos, Flavio</creatorcontrib><creatorcontrib>Marcondes, Francisco Supino</creatorcontrib><creatorcontrib>Hammerschmidt, Niklas</creatorcontrib><creatorcontrib>Novais, Paulo</creatorcontrib><collection>RCAAP open access repository</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Durães, Dalila</au><au>Santos, Flavio</au><au>Marcondes, Francisco Supino</au><au>Hammerschmidt, Niklas</au><au>Novais, Paulo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Applying multisensor in-car situations to detect violence</atitle><jtitle>Expert systems</jtitle><date>2025-01</date><risdate>2025</risdate><volume>42</volume><issue>1</issue><epage>n/a</epage><issn>0266-4720</issn><eissn>1468-0394</eissn><abstract>Violence recognition is challenging because it can be presented in very different forms. For example, it can be present in an image by a person hitting another person or present in audio by a person being rude to another. Thus, audio and video are essential features to be analysed. In the audio approach, speech processing, music, and ambient sound are some of the main points of this problem since finding similarities and differences between these domains is necessary. Human activity can be classified into four different categories in the video approach, depending on the complexity and the number of body parts involved in the action. Examples of Human activity categories are considered: gestures, actions, interactions and activities. Recognizing human actions in the video becomes a challenge with this varied set of human activities. Furthermore, in the last years, the growth of deep learning techniques applied to this area has been enormous, and the reason is that their results surpass traditional signal processing on a large scale. This article is based on audio and video signals inside a vehicle to detect violence. Furthermore, the architecture used was ResNet model with Mel-spectrogram methodology for audio signals. The proposed method for video signal representation was RGB, which applied four different models: C2D, I3D, X3D, and Flow-Gated. Finally, multimodal fusion was applied at the end of the process. FCT - Fundação para a Ciência e a Tecnologia(039334)</abstract><cop>Oxford</cop><pub>Wiley</pub><doi>10.1111/exsy.13356</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-2221-2261</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0266-4720
ispartof Expert systems, 2025-01, Vol.42 (1), p.n/a
issn 0266-4720
1468-0394
language eng
recordid cdi_proquest_journals_3145850042
source Access via Wiley Online Library
subjects Audio
Audio signals
Body parts
in‐vehicle
Multimodal fusion
Signal processing
Speech processing
Video
Video signals
Violence
Violence detection
title Applying multisensor in-car situations to detect violence
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T14%3A36%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Applying%20multisensor%20in-car%20situations%20to%20detect%20violence&rft.jtitle=Expert%20systems&rft.au=Dur%C3%A3es,%20Dalila&rft.date=2025-01&rft.volume=42&rft.issue=1&rft.epage=n/a&rft.issn=0266-4720&rft.eissn=1468-0394&rft_id=info:doi/10.1111/exsy.13356&rft_dat=%3Cproquest_cross%3E3145850042%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3145850042&rft_id=info:pmid/&rfr_iscdi=true