Towards automated vocal mode classification in healthy singing voice – an XGBoost decision tree-based machine learning classifier

Auditory-perceptual assessment is widely used in clinical and pedagogical practice for speech and singing voice, yet several studies have shown poor intra- and inter-rater reliability in both clinical and singing voice contexts. Recent advances in artificial intelligence and machine learning offer m...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of voice 2023-11
Hauptverfasser: Sol, Jeroen, Aaen, Mathias, Sadolin, Cathrine, ten Bosch, Louis
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title Journal of voice
container_volume
creator Sol, Jeroen
Aaen, Mathias
Sadolin, Cathrine
ten Bosch, Louis
description Auditory-perceptual assessment is widely used in clinical and pedagogical practice for speech and singing voice, yet several studies have shown poor intra- and inter-rater reliability in both clinical and singing voice contexts. Recent advances in artificial intelligence and machine learning offer models for automated classification and have demonstrated discriminatory power in both pathological and healthy voice. This study develops and tests an XGBoost decision tree-based machine learning classifier to develop automated vocal mode classification in healthy singing voice. Classification models trained on mel-frequency cepstrum coefficients, MFCC-Zero-Time Windowing, glottal features, voice quality features, and α-ratios demonstrated 92% average F1-score accuracy in distinguishing metallic and non-metallic singing for male singers and 87% average F1-score for female singers. The model distinguished vocal modes with 70% and 69% average F1-score for male and female samples, respectively. Model performance was compared to human auditory-perceptual assessments of 64 corresponding samples performed by 41 professional singers. The model performed with approximating or subpar performance to human assessors on task-matched problems. The XGBoost gains observed across tested features reveal that the most important attributes for the tested classification problems were MFCCs and α-ratios between high and low frequency energy, with models trained on only these features achieving performance not statistically significantly different from the best tested models. The best automated models in this study do not yet match human auditory-perceptual discrimination but improve on previously reported F1-average accuracies in automated classification in singing voice.
doi_str_mv 10.1016/j.jvoice.2023.09.006
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2889590203</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0892199723002813</els_id><sourcerecordid>2889590203</sourcerecordid><originalsourceid>FETCH-LOGICAL-c254t-a5e4d8a8f1ec608465500eb838bbc2897847ba4a453a0379e861b9ed6932de2c3</originalsourceid><addsrcrecordid>eNp9kM1KJDEUhYM4YOvMG7jI0k3V3Er9JRtBxT8Q3Dgwu3AruW2nqapokm7pneAj-IbzJFNt61a4cBf3nHM5H2PHBeQFFM3vZb5ce2coFyDKHFQO0OyxWSHbMqtqKffZDKQSWaFUe8AOY1wCgJiuM_b24F8w2MhxlfyAiSxfe4M9H7wlbnqM0c2dweT8yN3IF4R9Wmx4dOPjNPzjL__3-s5x5H-vz72PiVsyLm4NKRBlHcYpdUCzcCPxnjCMW-dXNoWf7Mcc-0i_PvcR-3N1-XBxk93dX99enN1lRtRVyrCmykqU84JMA7Jq6hqAOlnKrjNCqlZWbYcVVnWJULaKZFN0imyjSmFJmPKInexyn4J_XlFMenDRUN_jSH4VtZBS1QoElJO02klN8DEGmuun4AYMG12A3jLXS71jrrfMNSg9MZ9spzsbTTXWUzUdjaPRkHWBTNLWu-8D_gPEbo_G</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2889590203</pqid></control><display><type>article</type><title>Towards automated vocal mode classification in healthy singing voice – an XGBoost decision tree-based machine learning classifier</title><source>Access via ScienceDirect (Elsevier)</source><creator>Sol, Jeroen ; Aaen, Mathias ; Sadolin, Cathrine ; ten Bosch, Louis</creator><creatorcontrib>Sol, Jeroen ; Aaen, Mathias ; Sadolin, Cathrine ; ten Bosch, Louis</creatorcontrib><description>Auditory-perceptual assessment is widely used in clinical and pedagogical practice for speech and singing voice, yet several studies have shown poor intra- and inter-rater reliability in both clinical and singing voice contexts. Recent advances in artificial intelligence and machine learning offer models for automated classification and have demonstrated discriminatory power in both pathological and healthy voice. This study develops and tests an XGBoost decision tree-based machine learning classifier to develop automated vocal mode classification in healthy singing voice. Classification models trained on mel-frequency cepstrum coefficients, MFCC-Zero-Time Windowing, glottal features, voice quality features, and α-ratios demonstrated 92% average F1-score accuracy in distinguishing metallic and non-metallic singing for male singers and 87% average F1-score for female singers. The model distinguished vocal modes with 70% and 69% average F1-score for male and female samples, respectively. Model performance was compared to human auditory-perceptual assessments of 64 corresponding samples performed by 41 professional singers. The model performed with approximating or subpar performance to human assessors on task-matched problems. The XGBoost gains observed across tested features reveal that the most important attributes for the tested classification problems were MFCCs and α-ratios between high and low frequency energy, with models trained on only these features achieving performance not statistically significantly different from the best tested models. The best automated models in this study do not yet match human auditory-perceptual discrimination but improve on previously reported F1-average accuracies in automated classification in singing voice.</description><identifier>ISSN: 0892-1997</identifier><identifier>EISSN: 1873-4588</identifier><identifier>DOI: 10.1016/j.jvoice.2023.09.006</identifier><language>eng</language><publisher>Elsevier Inc</publisher><subject>Artificial Intelligence ; Complete Vocal Technique ; Machine Learning ; Singing Voice ; Vocal Modes</subject><ispartof>Journal of voice, 2023-11</ispartof><rights>2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c254t-a5e4d8a8f1ec608465500eb838bbc2897847ba4a453a0379e861b9ed6932de2c3</citedby><cites>FETCH-LOGICAL-c254t-a5e4d8a8f1ec608465500eb838bbc2897847ba4a453a0379e861b9ed6932de2c3</cites><orcidid>0000-0003-4441-1103 ; 0000-0003-4619-9222</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.jvoice.2023.09.006$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Sol, Jeroen</creatorcontrib><creatorcontrib>Aaen, Mathias</creatorcontrib><creatorcontrib>Sadolin, Cathrine</creatorcontrib><creatorcontrib>ten Bosch, Louis</creatorcontrib><title>Towards automated vocal mode classification in healthy singing voice – an XGBoost decision tree-based machine learning classifier</title><title>Journal of voice</title><description>Auditory-perceptual assessment is widely used in clinical and pedagogical practice for speech and singing voice, yet several studies have shown poor intra- and inter-rater reliability in both clinical and singing voice contexts. Recent advances in artificial intelligence and machine learning offer models for automated classification and have demonstrated discriminatory power in both pathological and healthy voice. This study develops and tests an XGBoost decision tree-based machine learning classifier to develop automated vocal mode classification in healthy singing voice. Classification models trained on mel-frequency cepstrum coefficients, MFCC-Zero-Time Windowing, glottal features, voice quality features, and α-ratios demonstrated 92% average F1-score accuracy in distinguishing metallic and non-metallic singing for male singers and 87% average F1-score for female singers. The model distinguished vocal modes with 70% and 69% average F1-score for male and female samples, respectively. Model performance was compared to human auditory-perceptual assessments of 64 corresponding samples performed by 41 professional singers. The model performed with approximating or subpar performance to human assessors on task-matched problems. The XGBoost gains observed across tested features reveal that the most important attributes for the tested classification problems were MFCCs and α-ratios between high and low frequency energy, with models trained on only these features achieving performance not statistically significantly different from the best tested models. The best automated models in this study do not yet match human auditory-perceptual discrimination but improve on previously reported F1-average accuracies in automated classification in singing voice.</description><subject>Artificial Intelligence</subject><subject>Complete Vocal Technique</subject><subject>Machine Learning</subject><subject>Singing Voice</subject><subject>Vocal Modes</subject><issn>0892-1997</issn><issn>1873-4588</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9kM1KJDEUhYM4YOvMG7jI0k3V3Er9JRtBxT8Q3Dgwu3AruW2nqapokm7pneAj-IbzJFNt61a4cBf3nHM5H2PHBeQFFM3vZb5ce2coFyDKHFQO0OyxWSHbMqtqKffZDKQSWaFUe8AOY1wCgJiuM_b24F8w2MhxlfyAiSxfe4M9H7wlbnqM0c2dweT8yN3IF4R9Wmx4dOPjNPzjL__3-s5x5H-vz72PiVsyLm4NKRBlHcYpdUCzcCPxnjCMW-dXNoWf7Mcc-0i_PvcR-3N1-XBxk93dX99enN1lRtRVyrCmykqU84JMA7Jq6hqAOlnKrjNCqlZWbYcVVnWJULaKZFN0imyjSmFJmPKInexyn4J_XlFMenDRUN_jSH4VtZBS1QoElJO02klN8DEGmuun4AYMG12A3jLXS71jrrfMNSg9MZ9spzsbTTXWUzUdjaPRkHWBTNLWu-8D_gPEbo_G</recordid><startdate>20231110</startdate><enddate>20231110</enddate><creator>Sol, Jeroen</creator><creator>Aaen, Mathias</creator><creator>Sadolin, Cathrine</creator><creator>ten Bosch, Louis</creator><general>Elsevier Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-4441-1103</orcidid><orcidid>https://orcid.org/0000-0003-4619-9222</orcidid></search><sort><creationdate>20231110</creationdate><title>Towards automated vocal mode classification in healthy singing voice – an XGBoost decision tree-based machine learning classifier</title><author>Sol, Jeroen ; Aaen, Mathias ; Sadolin, Cathrine ; ten Bosch, Louis</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c254t-a5e4d8a8f1ec608465500eb838bbc2897847ba4a453a0379e861b9ed6932de2c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Artificial Intelligence</topic><topic>Complete Vocal Technique</topic><topic>Machine Learning</topic><topic>Singing Voice</topic><topic>Vocal Modes</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sol, Jeroen</creatorcontrib><creatorcontrib>Aaen, Mathias</creatorcontrib><creatorcontrib>Sadolin, Cathrine</creatorcontrib><creatorcontrib>ten Bosch, Louis</creatorcontrib><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of voice</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sol, Jeroen</au><au>Aaen, Mathias</au><au>Sadolin, Cathrine</au><au>ten Bosch, Louis</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Towards automated vocal mode classification in healthy singing voice – an XGBoost decision tree-based machine learning classifier</atitle><jtitle>Journal of voice</jtitle><date>2023-11-10</date><risdate>2023</risdate><issn>0892-1997</issn><eissn>1873-4588</eissn><abstract>Auditory-perceptual assessment is widely used in clinical and pedagogical practice for speech and singing voice, yet several studies have shown poor intra- and inter-rater reliability in both clinical and singing voice contexts. Recent advances in artificial intelligence and machine learning offer models for automated classification and have demonstrated discriminatory power in both pathological and healthy voice. This study develops and tests an XGBoost decision tree-based machine learning classifier to develop automated vocal mode classification in healthy singing voice. Classification models trained on mel-frequency cepstrum coefficients, MFCC-Zero-Time Windowing, glottal features, voice quality features, and α-ratios demonstrated 92% average F1-score accuracy in distinguishing metallic and non-metallic singing for male singers and 87% average F1-score for female singers. The model distinguished vocal modes with 70% and 69% average F1-score for male and female samples, respectively. Model performance was compared to human auditory-perceptual assessments of 64 corresponding samples performed by 41 professional singers. The model performed with approximating or subpar performance to human assessors on task-matched problems. The XGBoost gains observed across tested features reveal that the most important attributes for the tested classification problems were MFCCs and α-ratios between high and low frequency energy, with models trained on only these features achieving performance not statistically significantly different from the best tested models. The best automated models in this study do not yet match human auditory-perceptual discrimination but improve on previously reported F1-average accuracies in automated classification in singing voice.</abstract><pub>Elsevier Inc</pub><doi>10.1016/j.jvoice.2023.09.006</doi><orcidid>https://orcid.org/0000-0003-4441-1103</orcidid><orcidid>https://orcid.org/0000-0003-4619-9222</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0892-1997
ispartof Journal of voice, 2023-11
issn 0892-1997
1873-4588
language eng
recordid cdi_proquest_miscellaneous_2889590203
source Access via ScienceDirect (Elsevier)
subjects Artificial Intelligence
Complete Vocal Technique
Machine Learning
Singing Voice
Vocal Modes
title Towards automated vocal mode classification in healthy singing voice – an XGBoost decision tree-based machine learning classifier
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T02%3A49%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Towards%20automated%20vocal%20mode%20classification%20in%20healthy%20singing%20voice%20%E2%80%93%20an%20XGBoost%20decision%20tree-based%20machine%20learning%20classifier&rft.jtitle=Journal%20of%20voice&rft.au=Sol,%20Jeroen&rft.date=2023-11-10&rft.issn=0892-1997&rft.eissn=1873-4588&rft_id=info:doi/10.1016/j.jvoice.2023.09.006&rft_dat=%3Cproquest_cross%3E2889590203%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2889590203&rft_id=info:pmid/&rft_els_id=S0892199723002813&rfr_iscdi=true