Towards automated vocal mode classification in healthy singing voice – an XGBoost decision tree-based machine learning classifier
Auditory-perceptual assessment is widely used in clinical and pedagogical practice for speech and singing voice, yet several studies have shown poor intra- and inter-rater reliability in both clinical and singing voice contexts. Recent advances in artificial intelligence and machine learning offer m...
Gespeichert in:
Veröffentlicht in: | Journal of voice 2023-11 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | Journal of voice |
container_volume | |
creator | Sol, Jeroen Aaen, Mathias Sadolin, Cathrine ten Bosch, Louis |
description | Auditory-perceptual assessment is widely used in clinical and pedagogical practice for speech and singing voice, yet several studies have shown poor intra- and inter-rater reliability in both clinical and singing voice contexts. Recent advances in artificial intelligence and machine learning offer models for automated classification and have demonstrated discriminatory power in both pathological and healthy voice. This study develops and tests an XGBoost decision tree-based machine learning classifier to develop automated vocal mode classification in healthy singing voice. Classification models trained on mel-frequency cepstrum coefficients, MFCC-Zero-Time Windowing, glottal features, voice quality features, and α-ratios demonstrated 92% average F1-score accuracy in distinguishing metallic and non-metallic singing for male singers and 87% average F1-score for female singers. The model distinguished vocal modes with 70% and 69% average F1-score for male and female samples, respectively. Model performance was compared to human auditory-perceptual assessments of 64 corresponding samples performed by 41 professional singers. The model performed with approximating or subpar performance to human assessors on task-matched problems. The XGBoost gains observed across tested features reveal that the most important attributes for the tested classification problems were MFCCs and α-ratios between high and low frequency energy, with models trained on only these features achieving performance not statistically significantly different from the best tested models. The best automated models in this study do not yet match human auditory-perceptual discrimination but improve on previously reported F1-average accuracies in automated classification in singing voice. |
doi_str_mv | 10.1016/j.jvoice.2023.09.006 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2889590203</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0892199723002813</els_id><sourcerecordid>2889590203</sourcerecordid><originalsourceid>FETCH-LOGICAL-c254t-a5e4d8a8f1ec608465500eb838bbc2897847ba4a453a0379e861b9ed6932de2c3</originalsourceid><addsrcrecordid>eNp9kM1KJDEUhYM4YOvMG7jI0k3V3Er9JRtBxT8Q3Dgwu3AruW2nqapokm7pneAj-IbzJFNt61a4cBf3nHM5H2PHBeQFFM3vZb5ce2coFyDKHFQO0OyxWSHbMqtqKffZDKQSWaFUe8AOY1wCgJiuM_b24F8w2MhxlfyAiSxfe4M9H7wlbnqM0c2dweT8yN3IF4R9Wmx4dOPjNPzjL__3-s5x5H-vz72PiVsyLm4NKRBlHcYpdUCzcCPxnjCMW-dXNoWf7Mcc-0i_PvcR-3N1-XBxk93dX99enN1lRtRVyrCmykqU84JMA7Jq6hqAOlnKrjNCqlZWbYcVVnWJULaKZFN0imyjSmFJmPKInexyn4J_XlFMenDRUN_jSH4VtZBS1QoElJO02klN8DEGmuun4AYMG12A3jLXS71jrrfMNSg9MZ9spzsbTTXWUzUdjaPRkHWBTNLWu-8D_gPEbo_G</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2889590203</pqid></control><display><type>article</type><title>Towards automated vocal mode classification in healthy singing voice – an XGBoost decision tree-based machine learning classifier</title><source>Access via ScienceDirect (Elsevier)</source><creator>Sol, Jeroen ; Aaen, Mathias ; Sadolin, Cathrine ; ten Bosch, Louis</creator><creatorcontrib>Sol, Jeroen ; Aaen, Mathias ; Sadolin, Cathrine ; ten Bosch, Louis</creatorcontrib><description>Auditory-perceptual assessment is widely used in clinical and pedagogical practice for speech and singing voice, yet several studies have shown poor intra- and inter-rater reliability in both clinical and singing voice contexts. Recent advances in artificial intelligence and machine learning offer models for automated classification and have demonstrated discriminatory power in both pathological and healthy voice. This study develops and tests an XGBoost decision tree-based machine learning classifier to develop automated vocal mode classification in healthy singing voice. Classification models trained on mel-frequency cepstrum coefficients, MFCC-Zero-Time Windowing, glottal features, voice quality features, and α-ratios demonstrated 92% average F1-score accuracy in distinguishing metallic and non-metallic singing for male singers and 87% average F1-score for female singers. The model distinguished vocal modes with 70% and 69% average F1-score for male and female samples, respectively. Model performance was compared to human auditory-perceptual assessments of 64 corresponding samples performed by 41 professional singers. The model performed with approximating or subpar performance to human assessors on task-matched problems. The XGBoost gains observed across tested features reveal that the most important attributes for the tested classification problems were MFCCs and α-ratios between high and low frequency energy, with models trained on only these features achieving performance not statistically significantly different from the best tested models. The best automated models in this study do not yet match human auditory-perceptual discrimination but improve on previously reported F1-average accuracies in automated classification in singing voice.</description><identifier>ISSN: 0892-1997</identifier><identifier>EISSN: 1873-4588</identifier><identifier>DOI: 10.1016/j.jvoice.2023.09.006</identifier><language>eng</language><publisher>Elsevier Inc</publisher><subject>Artificial Intelligence ; Complete Vocal Technique ; Machine Learning ; Singing Voice ; Vocal Modes</subject><ispartof>Journal of voice, 2023-11</ispartof><rights>2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c254t-a5e4d8a8f1ec608465500eb838bbc2897847ba4a453a0379e861b9ed6932de2c3</citedby><cites>FETCH-LOGICAL-c254t-a5e4d8a8f1ec608465500eb838bbc2897847ba4a453a0379e861b9ed6932de2c3</cites><orcidid>0000-0003-4441-1103 ; 0000-0003-4619-9222</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.jvoice.2023.09.006$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids></links><search><creatorcontrib>Sol, Jeroen</creatorcontrib><creatorcontrib>Aaen, Mathias</creatorcontrib><creatorcontrib>Sadolin, Cathrine</creatorcontrib><creatorcontrib>ten Bosch, Louis</creatorcontrib><title>Towards automated vocal mode classification in healthy singing voice – an XGBoost decision tree-based machine learning classifier</title><title>Journal of voice</title><description>Auditory-perceptual assessment is widely used in clinical and pedagogical practice for speech and singing voice, yet several studies have shown poor intra- and inter-rater reliability in both clinical and singing voice contexts. Recent advances in artificial intelligence and machine learning offer models for automated classification and have demonstrated discriminatory power in both pathological and healthy voice. This study develops and tests an XGBoost decision tree-based machine learning classifier to develop automated vocal mode classification in healthy singing voice. Classification models trained on mel-frequency cepstrum coefficients, MFCC-Zero-Time Windowing, glottal features, voice quality features, and α-ratios demonstrated 92% average F1-score accuracy in distinguishing metallic and non-metallic singing for male singers and 87% average F1-score for female singers. The model distinguished vocal modes with 70% and 69% average F1-score for male and female samples, respectively. Model performance was compared to human auditory-perceptual assessments of 64 corresponding samples performed by 41 professional singers. The model performed with approximating or subpar performance to human assessors on task-matched problems. The XGBoost gains observed across tested features reveal that the most important attributes for the tested classification problems were MFCCs and α-ratios between high and low frequency energy, with models trained on only these features achieving performance not statistically significantly different from the best tested models. The best automated models in this study do not yet match human auditory-perceptual discrimination but improve on previously reported F1-average accuracies in automated classification in singing voice.</description><subject>Artificial Intelligence</subject><subject>Complete Vocal Technique</subject><subject>Machine Learning</subject><subject>Singing Voice</subject><subject>Vocal Modes</subject><issn>0892-1997</issn><issn>1873-4588</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9kM1KJDEUhYM4YOvMG7jI0k3V3Er9JRtBxT8Q3Dgwu3AruW2nqapokm7pneAj-IbzJFNt61a4cBf3nHM5H2PHBeQFFM3vZb5ce2coFyDKHFQO0OyxWSHbMqtqKffZDKQSWaFUe8AOY1wCgJiuM_b24F8w2MhxlfyAiSxfe4M9H7wlbnqM0c2dweT8yN3IF4R9Wmx4dOPjNPzjL__3-s5x5H-vz72PiVsyLm4NKRBlHcYpdUCzcCPxnjCMW-dXNoWf7Mcc-0i_PvcR-3N1-XBxk93dX99enN1lRtRVyrCmykqU84JMA7Jq6hqAOlnKrjNCqlZWbYcVVnWJULaKZFN0imyjSmFJmPKInexyn4J_XlFMenDRUN_jSH4VtZBS1QoElJO02klN8DEGmuun4AYMG12A3jLXS71jrrfMNSg9MZ9spzsbTTXWUzUdjaPRkHWBTNLWu-8D_gPEbo_G</recordid><startdate>20231110</startdate><enddate>20231110</enddate><creator>Sol, Jeroen</creator><creator>Aaen, Mathias</creator><creator>Sadolin, Cathrine</creator><creator>ten Bosch, Louis</creator><general>Elsevier Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-4441-1103</orcidid><orcidid>https://orcid.org/0000-0003-4619-9222</orcidid></search><sort><creationdate>20231110</creationdate><title>Towards automated vocal mode classification in healthy singing voice – an XGBoost decision tree-based machine learning classifier</title><author>Sol, Jeroen ; Aaen, Mathias ; Sadolin, Cathrine ; ten Bosch, Louis</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c254t-a5e4d8a8f1ec608465500eb838bbc2897847ba4a453a0379e861b9ed6932de2c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Artificial Intelligence</topic><topic>Complete Vocal Technique</topic><topic>Machine Learning</topic><topic>Singing Voice</topic><topic>Vocal Modes</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sol, Jeroen</creatorcontrib><creatorcontrib>Aaen, Mathias</creatorcontrib><creatorcontrib>Sadolin, Cathrine</creatorcontrib><creatorcontrib>ten Bosch, Louis</creatorcontrib><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of voice</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sol, Jeroen</au><au>Aaen, Mathias</au><au>Sadolin, Cathrine</au><au>ten Bosch, Louis</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Towards automated vocal mode classification in healthy singing voice – an XGBoost decision tree-based machine learning classifier</atitle><jtitle>Journal of voice</jtitle><date>2023-11-10</date><risdate>2023</risdate><issn>0892-1997</issn><eissn>1873-4588</eissn><abstract>Auditory-perceptual assessment is widely used in clinical and pedagogical practice for speech and singing voice, yet several studies have shown poor intra- and inter-rater reliability in both clinical and singing voice contexts. Recent advances in artificial intelligence and machine learning offer models for automated classification and have demonstrated discriminatory power in both pathological and healthy voice. This study develops and tests an XGBoost decision tree-based machine learning classifier to develop automated vocal mode classification in healthy singing voice. Classification models trained on mel-frequency cepstrum coefficients, MFCC-Zero-Time Windowing, glottal features, voice quality features, and α-ratios demonstrated 92% average F1-score accuracy in distinguishing metallic and non-metallic singing for male singers and 87% average F1-score for female singers. The model distinguished vocal modes with 70% and 69% average F1-score for male and female samples, respectively. Model performance was compared to human auditory-perceptual assessments of 64 corresponding samples performed by 41 professional singers. The model performed with approximating or subpar performance to human assessors on task-matched problems. The XGBoost gains observed across tested features reveal that the most important attributes for the tested classification problems were MFCCs and α-ratios between high and low frequency energy, with models trained on only these features achieving performance not statistically significantly different from the best tested models. The best automated models in this study do not yet match human auditory-perceptual discrimination but improve on previously reported F1-average accuracies in automated classification in singing voice.</abstract><pub>Elsevier Inc</pub><doi>10.1016/j.jvoice.2023.09.006</doi><orcidid>https://orcid.org/0000-0003-4441-1103</orcidid><orcidid>https://orcid.org/0000-0003-4619-9222</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0892-1997 |
ispartof | Journal of voice, 2023-11 |
issn | 0892-1997 1873-4588 |
language | eng |
recordid | cdi_proquest_miscellaneous_2889590203 |
source | Access via ScienceDirect (Elsevier) |
subjects | Artificial Intelligence Complete Vocal Technique Machine Learning Singing Voice Vocal Modes |
title | Towards automated vocal mode classification in healthy singing voice – an XGBoost decision tree-based machine learning classifier |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T02%3A49%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Towards%20automated%20vocal%20mode%20classification%20in%20healthy%20singing%20voice%20%E2%80%93%20an%20XGBoost%20decision%20tree-based%20machine%20learning%20classifier&rft.jtitle=Journal%20of%20voice&rft.au=Sol,%20Jeroen&rft.date=2023-11-10&rft.issn=0892-1997&rft.eissn=1873-4588&rft_id=info:doi/10.1016/j.jvoice.2023.09.006&rft_dat=%3Cproquest_cross%3E2889590203%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2889590203&rft_id=info:pmid/&rft_els_id=S0892199723002813&rfr_iscdi=true |