AI Generated Music Using Speech Emotion Recognition

This study aims to compare two different implementations of speech emotion recognition models.The emphasis is directed towards evaluating their efficacy in capturing and characterizing dialogues portrayed by actors within a film scene to create suitable musical intervals. The goal of the overarching...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Murru, Roberto, Krug, Jonas, Schmid, Tom, Steba, Garri, Giacinto, Giorgio, von Hoffmann, Alexander
Format: Dataset
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Murru, Roberto
Krug, Jonas
Schmid, Tom
Steba, Garri
Giacinto, Giorgio
von Hoffmann, Alexander
description This study aims to compare two different implementations of speech emotion recognition models.The emphasis is directed towards evaluating their efficacy in capturing and characterizing dialogues portrayed by actors within a film scene to create suitable musical intervals. The goal of the overarching research intends to derive indications to enhance the compositional process of film scores by recognizing the emotion in a particular scene. Based on established deep learning models, the study delves into the exploration of two distinct emotion classification metrics: The Six Emotion Prediction and the Valence/Arousal/Dominance Prediction. To facilitate a comparative analysis, a preliminary study an a following survey is deployed. The preliminary study confirms a significant difference in the generated MIDI data. For this reason, a survey is essential to detect the better fitting algorithm. Participants are tasked to rate the affective suitability of eight generated interval sequences to the corresponding film scenes. The Suitability is verified quantitatively using a bidirectional rating system. Both model assessments are conducted within a uniform sound design, thus ensuring unbiased conditions for evaluation. Upon a thorough examination of our extensive analysis, a preference for method A becomes increasingly evident.
doi_str_mv 10.34646/thn/ohmdok-1201
format Dataset
fullrecord <record><control><sourceid>datacite_PQ8</sourceid><recordid>TN_cdi_datacite_primary_10_34646_thn_ohmdok_1201</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_34646_thn_ohmdok_1201</sourcerecordid><originalsourceid>FETCH-datacite_primary_10_34646_thn_ohmdok_12013</originalsourceid><addsrcrecordid>eNpjYJAwNNAzNjEzMdMvycjTz8_ITcnP1jU0MjDkZDB29FRwT81LLUosSU1R8C0tzkxWCC3OzEtXCC5ITU3OUHDNzS_JzM9TCEpNzk_PywSxeRhY0xJzilN5oTQ3g4Gba4izh25KYklicmZJanxBUWZuYlFlvKFBPNjeeKC98RB740H2GpOhBQDHqj7H</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>dataset</recordtype></control><display><type>dataset</type><title>AI Generated Music Using Speech Emotion Recognition</title><source>DataCite</source><creator>Murru, Roberto ; Krug, Jonas ; Schmid, Tom ; Steba, Garri ; Giacinto, Giorgio ; von Hoffmann, Alexander</creator><creatorcontrib>Murru, Roberto ; Krug, Jonas ; Schmid, Tom ; Steba, Garri ; Giacinto, Giorgio ; von Hoffmann, Alexander</creatorcontrib><description>This study aims to compare two different implementations of speech emotion recognition models.The emphasis is directed towards evaluating their efficacy in capturing and characterizing dialogues portrayed by actors within a film scene to create suitable musical intervals. The goal of the overarching research intends to derive indications to enhance the compositional process of film scores by recognizing the emotion in a particular scene. Based on established deep learning models, the study delves into the exploration of two distinct emotion classification metrics: The Six Emotion Prediction and the Valence/Arousal/Dominance Prediction. To facilitate a comparative analysis, a preliminary study an a following survey is deployed. The preliminary study confirms a significant difference in the generated MIDI data. For this reason, a survey is essential to detect the better fitting algorithm. Participants are tasked to rate the affective suitability of eight generated interval sequences to the corresponding film scenes. The Suitability is verified quantitatively using a bidirectional rating system. Both model assessments are conducted within a uniform sound design, thus ensuring unbiased conditions for evaluation. Upon a thorough examination of our extensive analysis, a preference for method A becomes increasingly evident.</description><identifier>DOI: 10.34646/thn/ohmdok-1201</identifier><language>eng</language><publisher>Technische Hochschule Nürnberg Georg Simon Ohm</publisher><creationdate>2023</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0002-5759-3017 ; 0000-0002-2603-6904 ; 0009-0000-9642-8324 ; 0009-0001-6093-1902 ; 0009-0004-6178-5347 ; 0009-0005-1752-733X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,1888</link.rule.ids><linktorsrc>$$Uhttps://commons.datacite.org/doi.org/10.34646/thn/ohmdok-1201$$EView_record_in_DataCite.org$$FView_record_in_$$GDataCite.org$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Murru, Roberto</creatorcontrib><creatorcontrib>Krug, Jonas</creatorcontrib><creatorcontrib>Schmid, Tom</creatorcontrib><creatorcontrib>Steba, Garri</creatorcontrib><creatorcontrib>Giacinto, Giorgio</creatorcontrib><creatorcontrib>von Hoffmann, Alexander</creatorcontrib><title>AI Generated Music Using Speech Emotion Recognition</title><description>This study aims to compare two different implementations of speech emotion recognition models.The emphasis is directed towards evaluating their efficacy in capturing and characterizing dialogues portrayed by actors within a film scene to create suitable musical intervals. The goal of the overarching research intends to derive indications to enhance the compositional process of film scores by recognizing the emotion in a particular scene. Based on established deep learning models, the study delves into the exploration of two distinct emotion classification metrics: The Six Emotion Prediction and the Valence/Arousal/Dominance Prediction. To facilitate a comparative analysis, a preliminary study an a following survey is deployed. The preliminary study confirms a significant difference in the generated MIDI data. For this reason, a survey is essential to detect the better fitting algorithm. Participants are tasked to rate the affective suitability of eight generated interval sequences to the corresponding film scenes. The Suitability is verified quantitatively using a bidirectional rating system. Both model assessments are conducted within a uniform sound design, thus ensuring unbiased conditions for evaluation. Upon a thorough examination of our extensive analysis, a preference for method A becomes increasingly evident.</description><fulltext>true</fulltext><rsrctype>dataset</rsrctype><creationdate>2023</creationdate><recordtype>dataset</recordtype><sourceid>PQ8</sourceid><recordid>eNpjYJAwNNAzNjEzMdMvycjTz8_ITcnP1jU0MjDkZDB29FRwT81LLUosSU1R8C0tzkxWCC3OzEtXCC5ITU3OUHDNzS_JzM9TCEpNzk_PywSxeRhY0xJzilN5oTQ3g4Gba4izh25KYklicmZJanxBUWZuYlFlvKFBPNjeeKC98RB740H2GpOhBQDHqj7H</recordid><startdate>20231121</startdate><enddate>20231121</enddate><creator>Murru, Roberto</creator><creator>Krug, Jonas</creator><creator>Schmid, Tom</creator><creator>Steba, Garri</creator><creator>Giacinto, Giorgio</creator><creator>von Hoffmann, Alexander</creator><general>Technische Hochschule Nürnberg Georg Simon Ohm</general><scope>DYCCY</scope><scope>PQ8</scope><orcidid>https://orcid.org/0000-0002-5759-3017</orcidid><orcidid>https://orcid.org/0000-0002-2603-6904</orcidid><orcidid>https://orcid.org/0009-0000-9642-8324</orcidid><orcidid>https://orcid.org/0009-0001-6093-1902</orcidid><orcidid>https://orcid.org/0009-0004-6178-5347</orcidid><orcidid>https://orcid.org/0009-0005-1752-733X</orcidid></search><sort><creationdate>20231121</creationdate><title>AI Generated Music Using Speech Emotion Recognition</title><author>Murru, Roberto ; Krug, Jonas ; Schmid, Tom ; Steba, Garri ; Giacinto, Giorgio ; von Hoffmann, Alexander</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-datacite_primary_10_34646_thn_ohmdok_12013</frbrgroupid><rsrctype>datasets</rsrctype><prefilter>datasets</prefilter><language>eng</language><creationdate>2023</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Murru, Roberto</creatorcontrib><creatorcontrib>Krug, Jonas</creatorcontrib><creatorcontrib>Schmid, Tom</creatorcontrib><creatorcontrib>Steba, Garri</creatorcontrib><creatorcontrib>Giacinto, Giorgio</creatorcontrib><creatorcontrib>von Hoffmann, Alexander</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Murru, Roberto</au><au>Krug, Jonas</au><au>Schmid, Tom</au><au>Steba, Garri</au><au>Giacinto, Giorgio</au><au>von Hoffmann, Alexander</au><format>book</format><genre>unknown</genre><ristype>DATA</ristype><title>AI Generated Music Using Speech Emotion Recognition</title><date>2023-11-21</date><risdate>2023</risdate><abstract>This study aims to compare two different implementations of speech emotion recognition models.The emphasis is directed towards evaluating their efficacy in capturing and characterizing dialogues portrayed by actors within a film scene to create suitable musical intervals. The goal of the overarching research intends to derive indications to enhance the compositional process of film scores by recognizing the emotion in a particular scene. Based on established deep learning models, the study delves into the exploration of two distinct emotion classification metrics: The Six Emotion Prediction and the Valence/Arousal/Dominance Prediction. To facilitate a comparative analysis, a preliminary study an a following survey is deployed. The preliminary study confirms a significant difference in the generated MIDI data. For this reason, a survey is essential to detect the better fitting algorithm. Participants are tasked to rate the affective suitability of eight generated interval sequences to the corresponding film scenes. The Suitability is verified quantitatively using a bidirectional rating system. Both model assessments are conducted within a uniform sound design, thus ensuring unbiased conditions for evaluation. Upon a thorough examination of our extensive analysis, a preference for method A becomes increasingly evident.</abstract><pub>Technische Hochschule Nürnberg Georg Simon Ohm</pub><doi>10.34646/thn/ohmdok-1201</doi><orcidid>https://orcid.org/0000-0002-5759-3017</orcidid><orcidid>https://orcid.org/0000-0002-2603-6904</orcidid><orcidid>https://orcid.org/0009-0000-9642-8324</orcidid><orcidid>https://orcid.org/0009-0001-6093-1902</orcidid><orcidid>https://orcid.org/0009-0004-6178-5347</orcidid><orcidid>https://orcid.org/0009-0005-1752-733X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.34646/thn/ohmdok-1201
ispartof
issn
language eng
recordid cdi_datacite_primary_10_34646_thn_ohmdok_1201
source DataCite
title AI Generated Music Using Speech Emotion Recognition
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T20%3A51%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite_PQ8&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.au=Murru,%20Roberto&rft.date=2023-11-21&rft_id=info:doi/10.34646/thn/ohmdok-1201&rft_dat=%3Cdatacite_PQ8%3E10_34646_thn_ohmdok_1201%3C/datacite_PQ8%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true