Efficiency-oriented approaches for self-supervised speech representation learning
Self-supervised learning enables the training of large neural models without the need for large, labeled datasets. It has been generating breakthroughs in several fields, including computer vision, natural language processing, biology, and speech. In particular, the state-of-the-art in several speec...
Gespeichert in:
Veröffentlicht in: | International journal of speech technology 2024, Vol.27 (3), p.765-779 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 779 |
---|---|
container_issue | 3 |
container_start_page | 765 |
container_title | International journal of speech technology |
container_volume | 27 |
creator | Lugo, Luis Vielzeuf, Valentin |
description | Self-supervised learning enables the training of large neural models without the need for large, labeled datasets. It has been generating breakthroughs in several fields, including computer vision, natural language processing, biology, and speech. In particular, the state-of-the-art in several speech processing applications, such as automatic speech recognition or speaker identification, are models where the latent representation is learned using self-supervised approaches. Several configurations exist in self-supervised learning for speech, including contrastive, predictive, and multilingual approaches. There is, however, a crucial limitation in the majority of existing approaches: their high computational costs. These costs limit the deployment of models, the size of the training dataset, and the number of research groups that can afford research with large self-supervised models. Likewise, we should consider the environmental costs that high energy consumption implies. Efforts in this direction comprise optimization of existing models, neural architecture efficiency, improvements in finetuning for speech processing tasks, and data efficiency. But despite current efforts, more work could be done to address high computational costs in self-supervised representation learning. |
doi_str_mv | 10.1007/s10772-024-10121-9 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3103963874</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3103963874</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1159-2aa9bf14fd3b8c25db68ed338fcdbc8bef0e128108be49ba84da705ec3ee5f453</originalsourceid><addsrcrecordid>eNp9UMFKAzEQDaJgrf6ApwXP0Zkk202OUqoVCiLoOWSzk3ZL3V2TVujfG13Bm6f3mHnvzfAYu0a4RYDqLiFUleAgFEdAgdycsAmWeaQR4TRzqZELhbNzdpHSFgBMZcSEvSxCaH1LnT_yPmbcU1O4YYi98xtKRehjkWgXeDoMFD_blNdpIPKbItIQKWWH27d9V-zIxa7t1pfsLLhdoqtfnLK3h8XrfMlXz49P8_sV94il4cI5UwdUoZG19qJs6pmmRkodfFN7XVMAQqERMlWmdlo1roKSvCQqgyrllN2MufnXjwOlvd32h9jlk1YiSDOTulJZJUaVj31KkYIdYvvu4tEi2O_q7FidzdXZn-qsySY5mlIWd2uKf9H_uL4AqW5zmA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3103963874</pqid></control><display><type>article</type><title>Efficiency-oriented approaches for self-supervised speech representation learning</title><source>Springer Nature - Complete Springer Journals</source><creator>Lugo, Luis ; Vielzeuf, Valentin</creator><creatorcontrib>Lugo, Luis ; Vielzeuf, Valentin</creatorcontrib><description>Self-supervised learning enables the training of large neural models without the need for large, labeled datasets. It has been generating breakthroughs in several fields, including computer vision, natural language processing, biology, and speech. In particular, the state-of-the-art in several speech processing applications, such as automatic speech recognition or speaker identification, are models where the latent representation is learned using self-supervised approaches. Several configurations exist in self-supervised learning for speech, including contrastive, predictive, and multilingual approaches. There is, however, a crucial limitation in the majority of existing approaches: their high computational costs. These costs limit the deployment of models, the size of the training dataset, and the number of research groups that can afford research with large self-supervised models. Likewise, we should consider the environmental costs that high energy consumption implies. Efforts in this direction comprise optimization of existing models, neural architecture efficiency, improvements in finetuning for speech processing tasks, and data efficiency. But despite current efforts, more work could be done to address high computational costs in self-supervised representation learning.</description><identifier>ISSN: 1381-2416</identifier><identifier>EISSN: 1572-8110</identifier><identifier>DOI: 10.1007/s10772-024-10121-9</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Artificial Intelligence ; Automatic speech recognition ; Biology ; Computer vision ; Computing costs ; Datasets ; Efficiency ; Energy consumption ; Energy costs ; Engineering ; Machine learning ; Natural language processing ; Representations ; Self-supervised learning ; Signal,Image and Speech Processing ; Social Sciences ; Speaker identification ; Speech ; Speech processing ; Speech recognition ; Voice recognition</subject><ispartof>International journal of speech technology, 2024, Vol.27 (3), p.765-779</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c1159-2aa9bf14fd3b8c25db68ed338fcdbc8bef0e128108be49ba84da705ec3ee5f453</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10772-024-10121-9$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10772-024-10121-9$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Lugo, Luis</creatorcontrib><creatorcontrib>Vielzeuf, Valentin</creatorcontrib><title>Efficiency-oriented approaches for self-supervised speech representation learning</title><title>International journal of speech technology</title><addtitle>Int J Speech Technol</addtitle><description>Self-supervised learning enables the training of large neural models without the need for large, labeled datasets. It has been generating breakthroughs in several fields, including computer vision, natural language processing, biology, and speech. In particular, the state-of-the-art in several speech processing applications, such as automatic speech recognition or speaker identification, are models where the latent representation is learned using self-supervised approaches. Several configurations exist in self-supervised learning for speech, including contrastive, predictive, and multilingual approaches. There is, however, a crucial limitation in the majority of existing approaches: their high computational costs. These costs limit the deployment of models, the size of the training dataset, and the number of research groups that can afford research with large self-supervised models. Likewise, we should consider the environmental costs that high energy consumption implies. Efforts in this direction comprise optimization of existing models, neural architecture efficiency, improvements in finetuning for speech processing tasks, and data efficiency. But despite current efforts, more work could be done to address high computational costs in self-supervised representation learning.</description><subject>Artificial Intelligence</subject><subject>Automatic speech recognition</subject><subject>Biology</subject><subject>Computer vision</subject><subject>Computing costs</subject><subject>Datasets</subject><subject>Efficiency</subject><subject>Energy consumption</subject><subject>Energy costs</subject><subject>Engineering</subject><subject>Machine learning</subject><subject>Natural language processing</subject><subject>Representations</subject><subject>Self-supervised learning</subject><subject>Signal,Image and Speech Processing</subject><subject>Social Sciences</subject><subject>Speaker identification</subject><subject>Speech</subject><subject>Speech processing</subject><subject>Speech recognition</subject><subject>Voice recognition</subject><issn>1381-2416</issn><issn>1572-8110</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9UMFKAzEQDaJgrf6ApwXP0Zkk202OUqoVCiLoOWSzk3ZL3V2TVujfG13Bm6f3mHnvzfAYu0a4RYDqLiFUleAgFEdAgdycsAmWeaQR4TRzqZELhbNzdpHSFgBMZcSEvSxCaH1LnT_yPmbcU1O4YYi98xtKRehjkWgXeDoMFD_blNdpIPKbItIQKWWH27d9V-zIxa7t1pfsLLhdoqtfnLK3h8XrfMlXz49P8_sV94il4cI5UwdUoZG19qJs6pmmRkodfFN7XVMAQqERMlWmdlo1roKSvCQqgyrllN2MufnXjwOlvd32h9jlk1YiSDOTulJZJUaVj31KkYIdYvvu4tEi2O_q7FidzdXZn-qsySY5mlIWd2uKf9H_uL4AqW5zmA</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Lugo, Luis</creator><creator>Vielzeuf, Valentin</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7T9</scope></search><sort><creationdate>2024</creationdate><title>Efficiency-oriented approaches for self-supervised speech representation learning</title><author>Lugo, Luis ; Vielzeuf, Valentin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1159-2aa9bf14fd3b8c25db68ed338fcdbc8bef0e128108be49ba84da705ec3ee5f453</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial Intelligence</topic><topic>Automatic speech recognition</topic><topic>Biology</topic><topic>Computer vision</topic><topic>Computing costs</topic><topic>Datasets</topic><topic>Efficiency</topic><topic>Energy consumption</topic><topic>Energy costs</topic><topic>Engineering</topic><topic>Machine learning</topic><topic>Natural language processing</topic><topic>Representations</topic><topic>Self-supervised learning</topic><topic>Signal,Image and Speech Processing</topic><topic>Social Sciences</topic><topic>Speaker identification</topic><topic>Speech</topic><topic>Speech processing</topic><topic>Speech recognition</topic><topic>Voice recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lugo, Luis</creatorcontrib><creatorcontrib>Vielzeuf, Valentin</creatorcontrib><collection>CrossRef</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><jtitle>International journal of speech technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lugo, Luis</au><au>Vielzeuf, Valentin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Efficiency-oriented approaches for self-supervised speech representation learning</atitle><jtitle>International journal of speech technology</jtitle><stitle>Int J Speech Technol</stitle><date>2024</date><risdate>2024</risdate><volume>27</volume><issue>3</issue><spage>765</spage><epage>779</epage><pages>765-779</pages><issn>1381-2416</issn><eissn>1572-8110</eissn><abstract>Self-supervised learning enables the training of large neural models without the need for large, labeled datasets. It has been generating breakthroughs in several fields, including computer vision, natural language processing, biology, and speech. In particular, the state-of-the-art in several speech processing applications, such as automatic speech recognition or speaker identification, are models where the latent representation is learned using self-supervised approaches. Several configurations exist in self-supervised learning for speech, including contrastive, predictive, and multilingual approaches. There is, however, a crucial limitation in the majority of existing approaches: their high computational costs. These costs limit the deployment of models, the size of the training dataset, and the number of research groups that can afford research with large self-supervised models. Likewise, we should consider the environmental costs that high energy consumption implies. Efforts in this direction comprise optimization of existing models, neural architecture efficiency, improvements in finetuning for speech processing tasks, and data efficiency. But despite current efforts, more work could be done to address high computational costs in self-supervised representation learning.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10772-024-10121-9</doi><tpages>15</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1381-2416 |
ispartof | International journal of speech technology, 2024, Vol.27 (3), p.765-779 |
issn | 1381-2416 1572-8110 |
language | eng |
recordid | cdi_proquest_journals_3103963874 |
source | Springer Nature - Complete Springer Journals |
subjects | Artificial Intelligence Automatic speech recognition Biology Computer vision Computing costs Datasets Efficiency Energy consumption Energy costs Engineering Machine learning Natural language processing Representations Self-supervised learning Signal,Image and Speech Processing Social Sciences Speaker identification Speech Speech processing Speech recognition Voice recognition |
title | Efficiency-oriented approaches for self-supervised speech representation learning |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-12T16%3A43%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Efficiency-oriented%20approaches%20for%20self-supervised%20speech%20representation%20learning&rft.jtitle=International%20journal%20of%20speech%20technology&rft.au=Lugo,%20Luis&rft.date=2024&rft.volume=27&rft.issue=3&rft.spage=765&rft.epage=779&rft.pages=765-779&rft.issn=1381-2416&rft.eissn=1572-8110&rft_id=info:doi/10.1007/s10772-024-10121-9&rft_dat=%3Cproquest_cross%3E3103963874%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3103963874&rft_id=info:pmid/&rfr_iscdi=true |