Lyrics segmentation via bimodal text–audio representation

Song lyrics contain repeated patterns that have been proven to facilitate automated lyrics segmentation, with the final goal of detecting the building blocks (e.g., chorus, verse) of a song text. Our contribution in this article is twofold. First, we introduce a convolutional neural network (CNN)-ba...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Natural language engineering 2022-05, Vol.28 (3), p.317-336
Hauptverfasser:	Fell, Michael, Nechaev, Yaroslav, Meseguer-Brocal, Gabriel, Cabrio, Elena, Gandon, Fabien, Peeters, Geoffroy
Format:	Artikel
Sprache:	eng
Schlagworte:	Acoustics Artificial Intelligence Artificial neural networks Automation Computation and Language Computer Science Datasets Experiments Information retrieval Labeling Lyrics Mass media Music Search engines Segmentation Semantics Songs Syntax Text structure
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	336
container_issue	3
container_start_page	317
container_title	Natural language engineering
container_volume	28
creator	Fell, Michael Nechaev, Yaroslav Meseguer-Brocal, Gabriel Cabrio, Elena Gandon, Fabien Peeters, Geoffroy
description	Song lyrics contain repeated patterns that have been proven to facilitate automated lyrics segmentation, with the final goal of detecting the building blocks (e.g., chorus, verse) of a song text. Our contribution in this article is twofold. First, we introduce a convolutional neural network (CNN)-based model that learns to segment the lyrics based on their repetitive text structure. We experiment with novel features to reveal different kinds of repetitions in the lyrics, for instance based on phonetical and syntactical properties. Second, using a novel corpus where the song text is synchronized to the audio of the song, we show that the text and audio modalities capture complementary structure of the lyrics and that combining both is beneficial for lyrics segmentation performance. For the purely text-based lyrics segmentation on a dataset of 103k lyrics, we achieve an F-score of 67.4%, improving on the state of the art (59.2% F-score). On the synchronized text–audio dataset of 4.8k songs, we show that the additional audio features improve segmentation performance to 75.3% F-score, significantly outperforming the purely text-based approaches.
doi_str_mv	10.1017/S1351324921000024
format	Article
fullrecord	<record><control><sourceid>proquest_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_03295581v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><cupid>10_1017_S1351324921000024</cupid><sourcerecordid>2647985099</sourcerecordid><originalsourceid>FETCH-LOGICAL-c394t-f24f48c0ccd74ea2fbcf0da04b4d8b835c48c96b70b4167a86520eb5c620db433</originalsourceid><addsrcrecordid>eNp1kM9KAzEQxoMoWKsP4G3Bk4fVTP7sbvBUilphwYN6Dkk2W1O6TU22xd58B9_QJzGl_jmIc5lhvt_3MQxCp4AvAEN5-QCUAyVMEMCpCNtDA2CFyCsAvJ_mJOdb_RAdxThLCIOSDdBVvQnOxCzaaWcXveqdX2RrpzLtOt-oedbb1_7j7V2tGuezYJfBxm_uGB20ah7tyVcfoqeb68fxJK_vb-_Gozo3VLA-bwlrWWWwMU3JrCKtNi1uFGaaNZWuKDdJFYUusWZQlKoqOMFWc1MQ3GhG6RCd73Kf1Vwug-tU2EivnJyMarndYUoE5xWsIbFnO3YZ_MvKxl7O_Cos0nmSFKwUFcdCJAp2lAk-xmDbn1jAcvtP-eefyUO_PKrTwTVT-xv9v-sTW113GQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2647985099</pqid></control><display><type>article</type><title>Lyrics segmentation via bimodal text–audio representation</title><source>Cambridge University Press Journals Complete</source><creator>Fell, Michael ; Nechaev, Yaroslav ; Meseguer-Brocal, Gabriel ; Cabrio, Elena ; Gandon, Fabien ; Peeters, Geoffroy</creator><creatorcontrib>Fell, Michael ; Nechaev, Yaroslav ; Meseguer-Brocal, Gabriel ; Cabrio, Elena ; Gandon, Fabien ; Peeters, Geoffroy</creatorcontrib><description>Song lyrics contain repeated patterns that have been proven to facilitate automated lyrics segmentation, with the final goal of detecting the building blocks (e.g., chorus, verse) of a song text. Our contribution in this article is twofold. First, we introduce a convolutional neural network (CNN)-based model that learns to segment the lyrics based on their repetitive text structure. We experiment with novel features to reveal different kinds of repetitions in the lyrics, for instance based on phonetical and syntactical properties. Second, using a novel corpus where the song text is synchronized to the audio of the song, we show that the text and audio modalities capture complementary structure of the lyrics and that combining both is beneficial for lyrics segmentation performance. For the purely text-based lyrics segmentation on a dataset of 103k lyrics, we achieve an F-score of 67.4%, improving on the state of the art (59.2% F-score). On the synchronized text–audio dataset of 4.8k songs, we show that the additional audio features improve segmentation performance to 75.3% F-score, significantly outperforming the purely text-based approaches.</description><identifier>ISSN: 1351-3249</identifier><identifier>EISSN: 1469-8110</identifier><identifier>DOI: 10.1017/S1351324921000024</identifier><language>eng</language><publisher>Cambridge, UK: Cambridge University Press</publisher><subject>Acoustics ; Artificial Intelligence ; Artificial neural networks ; Automation ; Computation and Language ; Computer Science ; Datasets ; Experiments ; Information retrieval ; Labeling ; Lyrics ; Mass media ; Music ; Search engines ; Segmentation ; Semantics ; Songs ; Syntax ; Text structure</subject><ispartof>Natural language engineering, 2022-05, Vol.28 (3), p.317-336</ispartof><rights>The Author(s), 2021. Published by Cambridge University Press</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c394t-f24f48c0ccd74ea2fbcf0da04b4d8b835c48c96b70b4167a86520eb5c620db433</citedby><cites>FETCH-LOGICAL-c394t-f24f48c0ccd74ea2fbcf0da04b4d8b835c48c96b70b4167a86520eb5c620db433</cites><orcidid>0000-0003-0543-1232 ; 0000-0001-5255-3019</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.cambridge.org/core/product/identifier/S1351324921000024/type/journal_article$$EHTML$$P50$$Gcambridge$$H</linktohtml><link.rule.ids>164,230,314,778,782,883,27907,27908,55611</link.rule.ids><backlink>$$Uhttps://hal.science/hal-03295581$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Fell, Michael</creatorcontrib><creatorcontrib>Nechaev, Yaroslav</creatorcontrib><creatorcontrib>Meseguer-Brocal, Gabriel</creatorcontrib><creatorcontrib>Cabrio, Elena</creatorcontrib><creatorcontrib>Gandon, Fabien</creatorcontrib><creatorcontrib>Peeters, Geoffroy</creatorcontrib><title>Lyrics segmentation via bimodal text–audio representation</title><title>Natural language engineering</title><addtitle>Nat. Lang. Eng</addtitle><description>Song lyrics contain repeated patterns that have been proven to facilitate automated lyrics segmentation, with the final goal of detecting the building blocks (e.g., chorus, verse) of a song text. Our contribution in this article is twofold. First, we introduce a convolutional neural network (CNN)-based model that learns to segment the lyrics based on their repetitive text structure. We experiment with novel features to reveal different kinds of repetitions in the lyrics, for instance based on phonetical and syntactical properties. Second, using a novel corpus where the song text is synchronized to the audio of the song, we show that the text and audio modalities capture complementary structure of the lyrics and that combining both is beneficial for lyrics segmentation performance. For the purely text-based lyrics segmentation on a dataset of 103k lyrics, we achieve an F-score of 67.4%, improving on the state of the art (59.2% F-score). On the synchronized text–audio dataset of 4.8k songs, we show that the additional audio features improve segmentation performance to 75.3% F-score, significantly outperforming the purely text-based approaches.</description><subject>Acoustics</subject><subject>Artificial Intelligence</subject><subject>Artificial neural networks</subject><subject>Automation</subject><subject>Computation and Language</subject><subject>Computer Science</subject><subject>Datasets</subject><subject>Experiments</subject><subject>Information retrieval</subject><subject>Labeling</subject><subject>Lyrics</subject><subject>Mass media</subject><subject>Music</subject><subject>Search engines</subject><subject>Segmentation</subject><subject>Semantics</subject><subject>Songs</subject><subject>Syntax</subject><subject>Text structure</subject><issn>1351-3249</issn><issn>1469-8110</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp1kM9KAzEQxoMoWKsP4G3Bk4fVTP7sbvBUilphwYN6Dkk2W1O6TU22xd58B9_QJzGl_jmIc5lhvt_3MQxCp4AvAEN5-QCUAyVMEMCpCNtDA2CFyCsAvJ_mJOdb_RAdxThLCIOSDdBVvQnOxCzaaWcXveqdX2RrpzLtOt-oedbb1_7j7V2tGuezYJfBxm_uGB20ah7tyVcfoqeb68fxJK_vb-_Gozo3VLA-bwlrWWWwMU3JrCKtNi1uFGaaNZWuKDdJFYUusWZQlKoqOMFWc1MQ3GhG6RCd73Kf1Vwug-tU2EivnJyMarndYUoE5xWsIbFnO3YZ_MvKxl7O_Cos0nmSFKwUFcdCJAp2lAk-xmDbn1jAcvtP-eefyUO_PKrTwTVT-xv9v-sTW113GQ</recordid><startdate>20220501</startdate><enddate>20220501</enddate><creator>Fell, Michael</creator><creator>Nechaev, Yaroslav</creator><creator>Meseguer-Brocal, Gabriel</creator><creator>Cabrio, Elena</creator><creator>Gandon, Fabien</creator><creator>Peeters, Geoffroy</creator><general>Cambridge University Press</general><general>Cambridge University Press (CUP)</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7T9</scope><scope>7XB</scope><scope>88G</scope><scope>8AL</scope><scope>8FE</scope><scope>8FG</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>CPGLG</scope><scope>CRLPW</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L6V</scope><scope>M0N</scope><scope>M2M</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PSYQQ</scope><scope>PTHSS</scope><scope>Q9U</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0003-0543-1232</orcidid><orcidid>https://orcid.org/0000-0001-5255-3019</orcidid></search><sort><creationdate>20220501</creationdate><title>Lyrics segmentation via bimodal text–audio representation</title><author>Fell, Michael ; Nechaev, Yaroslav ; Meseguer-Brocal, Gabriel ; Cabrio, Elena ; Gandon, Fabien ; Peeters, Geoffroy</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c394t-f24f48c0ccd74ea2fbcf0da04b4d8b835c48c96b70b4167a86520eb5c620db433</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Acoustics</topic><topic>Artificial Intelligence</topic><topic>Artificial neural networks</topic><topic>Automation</topic><topic>Computation and Language</topic><topic>Computer Science</topic><topic>Datasets</topic><topic>Experiments</topic><topic>Information retrieval</topic><topic>Labeling</topic><topic>Lyrics</topic><topic>Mass media</topic><topic>Music</topic><topic>Search engines</topic><topic>Segmentation</topic><topic>Semantics</topic><topic>Songs</topic><topic>Syntax</topic><topic>Text structure</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Fell, Michael</creatorcontrib><creatorcontrib>Nechaev, Yaroslav</creatorcontrib><creatorcontrib>Meseguer-Brocal, Gabriel</creatorcontrib><creatorcontrib>Cabrio, Elena</creatorcontrib><creatorcontrib>Gandon, Fabien</creatorcontrib><creatorcontrib>Peeters, Geoffroy</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Psychology Database (Alumni)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Social Science Premium Collection</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>Linguistics Collection</collection><collection>Linguistics Database</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Engineering Collection</collection><collection>Computing Database</collection><collection>Psychology Database</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest One Psychology</collection><collection>Engineering Collection</collection><collection>ProQuest Central Basic</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><jtitle>Natural language engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fell, Michael</au><au>Nechaev, Yaroslav</au><au>Meseguer-Brocal, Gabriel</au><au>Cabrio, Elena</au><au>Gandon, Fabien</au><au>Peeters, Geoffroy</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Lyrics segmentation via bimodal text–audio representation</atitle><jtitle>Natural language engineering</jtitle><addtitle>Nat. Lang. Eng</addtitle><date>2022-05-01</date><risdate>2022</risdate><volume>28</volume><issue>3</issue><spage>317</spage><epage>336</epage><pages>317-336</pages><issn>1351-3249</issn><eissn>1469-8110</eissn><abstract>Song lyrics contain repeated patterns that have been proven to facilitate automated lyrics segmentation, with the final goal of detecting the building blocks (e.g., chorus, verse) of a song text. Our contribution in this article is twofold. First, we introduce a convolutional neural network (CNN)-based model that learns to segment the lyrics based on their repetitive text structure. We experiment with novel features to reveal different kinds of repetitions in the lyrics, for instance based on phonetical and syntactical properties. Second, using a novel corpus where the song text is synchronized to the audio of the song, we show that the text and audio modalities capture complementary structure of the lyrics and that combining both is beneficial for lyrics segmentation performance. For the purely text-based lyrics segmentation on a dataset of 103k lyrics, we achieve an F-score of 67.4%, improving on the state of the art (59.2% F-score). On the synchronized text–audio dataset of 4.8k songs, we show that the additional audio features improve segmentation performance to 75.3% F-score, significantly outperforming the purely text-based approaches.</abstract><cop>Cambridge, UK</cop><pub>Cambridge University Press</pub><doi>10.1017/S1351324921000024</doi><tpages>20</tpages><orcidid>https://orcid.org/0000-0003-0543-1232</orcidid><orcidid>https://orcid.org/0000-0001-5255-3019</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1351-3249
ispartof	Natural language engineering, 2022-05, Vol.28 (3), p.317-336
issn	1351-3249 1469-8110
language	eng
recordid	cdi_hal_primary_oai_HAL_hal_03295581v1
source	Cambridge University Press Journals Complete
subjects	Acoustics Artificial Intelligence Artificial neural networks Automation Computation and Language Computer Science Datasets Experiments Information retrieval Labeling Lyrics Mass media Music Search engines Segmentation Semantics Songs Syntax Text structure
title	Lyrics segmentation via bimodal text–audio representation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T15%3A52%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Lyrics%20segmentation%20via%20bimodal%20text%E2%80%93audio%20representation&rft.jtitle=Natural%20language%20engineering&rft.au=Fell,%20Michael&rft.date=2022-05-01&rft.volume=28&rft.issue=3&rft.spage=317&rft.epage=336&rft.pages=317-336&rft.issn=1351-3249&rft.eissn=1469-8110&rft_id=info:doi/10.1017/S1351324921000024&rft_dat=%3Cproquest_hal_p%3E2647985099%3C/proquest_hal_p%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2647985099&rft_id=info:pmid/&rft_cupid=10_1017_S1351324921000024&rfr_iscdi=true