Robust acoustic domain identification with its application to speaker diarization

With the rise in multimedia content over the years, more variety is observed in the recording environments of audio. An audio processing system might benefit when it has a module to identify the acoustic domain at its front-end. In this paper, we demonstrate the idea of acoustic domain identificatio...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of speech technology 2022-12, Vol.25 (4), p.933-945
Hauptverfasser: Kumar, A Kishore, Waldekar, Shefali, Sahidullah, Md, Saha, Goutam
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 945
container_issue 4
container_start_page 933
container_title International journal of speech technology
container_volume 25
creator Kumar, A Kishore
Waldekar, Shefali
Sahidullah, Md
Saha, Goutam
description With the rise in multimedia content over the years, more variety is observed in the recording environments of audio. An audio processing system might benefit when it has a module to identify the acoustic domain at its front-end. In this paper, we demonstrate the idea of acoustic domain identification (ADI) for speaker diarization . For this, we first present a detailed study of the various domains of the third DIHARD challenge highlighting the factors that differentiated them from each other. Our main contribution is to develop a simple and efficient solution for ADI. In the present work, we explore speaker embeddings for this task. Next, we integrate the ADI module with the speaker diarization framework of the DIHARD III challenge. The performance substantially improved over that of the baseline when the thresholds for agglomerative hierarchical clustering were optimized according to the respective domains. We achieved a relative improvement of more than 5 % and 8 % in DER for core and full conditions, respectively, on Track 1 of the DIHARD III evaluation set.
doi_str_mv 10.1007/s10772-022-09990-9
format Article
fullrecord <record><control><sourceid>proquest_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_03719697v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2753690969</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2639-72ee6e35d3bff2d7b317392c5b9d5b71abc6f88112a13320ecabca93eb2697783</originalsourceid><addsrcrecordid>eNp9kE9LAzEQxYMoWKtfwFPAk4fVTOJuNsciaoWCKHoOSTZrU9vNmqSKfnrTrn9uHoYZHr_3GB5Cx0DOgBB-HoFwTgtC8wghSCF20AjKLNUAZDffrIaCXkC1jw5iXBBCBBd0hO4fvF7HhJXxeTmDG79SrsOusV1yrTMqOd_hd5fm2KWIVd8vf8TkceyterEBN04F97mVD9Feq5bRHn3vMXq6vnq8nBazu5vby8msMLRiouDU2sqysmG6bWnDNQPOBDWlFk2pOShtqrbO31MFjFFiTVaUYFbTSnBeszE6HXLnain74FYqfEivnJxOZnKjEcZBZPYNMnsysH3wr2sbk1z4dejye5LyklWCZDBTdKBM8DEG2_7GApGbmuVQs8w1y23NcmNigylmuHu24S_6H9cX_U6ALA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2753690969</pqid></control><display><type>article</type><title>Robust acoustic domain identification with its application to speaker diarization</title><source>SpringerLink Journals - AutoHoldings</source><creator>Kumar, A Kishore ; Waldekar, Shefali ; Sahidullah, Md ; Saha, Goutam</creator><creatorcontrib>Kumar, A Kishore ; Waldekar, Shefali ; Sahidullah, Md ; Saha, Goutam</creatorcontrib><description>With the rise in multimedia content over the years, more variety is observed in the recording environments of audio. An audio processing system might benefit when it has a module to identify the acoustic domain at its front-end. In this paper, we demonstrate the idea of acoustic domain identification (ADI) for speaker diarization . For this, we first present a detailed study of the various domains of the third DIHARD challenge highlighting the factors that differentiated them from each other. Our main contribution is to develop a simple and efficient solution for ADI. In the present work, we explore speaker embeddings for this task. Next, we integrate the ADI module with the speaker diarization framework of the DIHARD III challenge. The performance substantially improved over that of the baseline when the thresholds for agglomerative hierarchical clustering were optimized according to the respective domains. We achieved a relative improvement of more than 5 % and 8 % in DER for core and full conditions, respectively, on Track 1 of the DIHARD III evaluation set.</description><identifier>ISSN: 1381-2416</identifier><identifier>EISSN: 1572-8110</identifier><identifier>DOI: 10.1007/s10772-022-09990-9</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Acoustics ; Artificial Intelligence ; Cluster analysis ; Clustering ; Computer Science ; Computer Vision and Pattern Recognition ; Domains ; Engineering ; Identification ; Modules ; Multimedia ; Signal and Image Processing ; Signal,Image and Speech Processing ; Social Sciences ; Sound</subject><ispartof>International journal of speech technology, 2022-12, Vol.25 (4), p.933-945</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022</rights><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022.</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c2639-72ee6e35d3bff2d7b317392c5b9d5b71abc6f88112a13320ecabca93eb2697783</cites><orcidid>0000-0002-0616-7608</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10772-022-09990-9$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10772-022-09990-9$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>230,314,780,784,885,27924,27925,41488,42557,51319</link.rule.ids><backlink>$$Uhttps://inria.hal.science/hal-03719697$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Kumar, A Kishore</creatorcontrib><creatorcontrib>Waldekar, Shefali</creatorcontrib><creatorcontrib>Sahidullah, Md</creatorcontrib><creatorcontrib>Saha, Goutam</creatorcontrib><title>Robust acoustic domain identification with its application to speaker diarization</title><title>International journal of speech technology</title><addtitle>Int J Speech Technol</addtitle><description>With the rise in multimedia content over the years, more variety is observed in the recording environments of audio. An audio processing system might benefit when it has a module to identify the acoustic domain at its front-end. In this paper, we demonstrate the idea of acoustic domain identification (ADI) for speaker diarization . For this, we first present a detailed study of the various domains of the third DIHARD challenge highlighting the factors that differentiated them from each other. Our main contribution is to develop a simple and efficient solution for ADI. In the present work, we explore speaker embeddings for this task. Next, we integrate the ADI module with the speaker diarization framework of the DIHARD III challenge. The performance substantially improved over that of the baseline when the thresholds for agglomerative hierarchical clustering were optimized according to the respective domains. We achieved a relative improvement of more than 5 % and 8 % in DER for core and full conditions, respectively, on Track 1 of the DIHARD III evaluation set.</description><subject>Acoustics</subject><subject>Artificial Intelligence</subject><subject>Cluster analysis</subject><subject>Clustering</subject><subject>Computer Science</subject><subject>Computer Vision and Pattern Recognition</subject><subject>Domains</subject><subject>Engineering</subject><subject>Identification</subject><subject>Modules</subject><subject>Multimedia</subject><subject>Signal and Image Processing</subject><subject>Signal,Image and Speech Processing</subject><subject>Social Sciences</subject><subject>Sound</subject><issn>1381-2416</issn><issn>1572-8110</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kE9LAzEQxYMoWKtfwFPAk4fVTOJuNsciaoWCKHoOSTZrU9vNmqSKfnrTrn9uHoYZHr_3GB5Cx0DOgBB-HoFwTgtC8wghSCF20AjKLNUAZDffrIaCXkC1jw5iXBBCBBd0hO4fvF7HhJXxeTmDG79SrsOusV1yrTMqOd_hd5fm2KWIVd8vf8TkceyterEBN04F97mVD9Feq5bRHn3vMXq6vnq8nBazu5vby8msMLRiouDU2sqysmG6bWnDNQPOBDWlFk2pOShtqrbO31MFjFFiTVaUYFbTSnBeszE6HXLnain74FYqfEivnJxOZnKjEcZBZPYNMnsysH3wr2sbk1z4dejye5LyklWCZDBTdKBM8DEG2_7GApGbmuVQs8w1y23NcmNigylmuHu24S_6H9cX_U6ALA</recordid><startdate>20221201</startdate><enddate>20221201</enddate><creator>Kumar, A Kishore</creator><creator>Waldekar, Shefali</creator><creator>Sahidullah, Md</creator><creator>Saha, Goutam</creator><general>Springer US</general><general>Springer Nature B.V</general><general>Springer Verlag</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7T9</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0002-0616-7608</orcidid></search><sort><creationdate>20221201</creationdate><title>Robust acoustic domain identification with its application to speaker diarization</title><author>Kumar, A Kishore ; Waldekar, Shefali ; Sahidullah, Md ; Saha, Goutam</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2639-72ee6e35d3bff2d7b317392c5b9d5b71abc6f88112a13320ecabca93eb2697783</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Acoustics</topic><topic>Artificial Intelligence</topic><topic>Cluster analysis</topic><topic>Clustering</topic><topic>Computer Science</topic><topic>Computer Vision and Pattern Recognition</topic><topic>Domains</topic><topic>Engineering</topic><topic>Identification</topic><topic>Modules</topic><topic>Multimedia</topic><topic>Signal and Image Processing</topic><topic>Signal,Image and Speech Processing</topic><topic>Social Sciences</topic><topic>Sound</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kumar, A Kishore</creatorcontrib><creatorcontrib>Waldekar, Shefali</creatorcontrib><creatorcontrib>Sahidullah, Md</creatorcontrib><creatorcontrib>Saha, Goutam</creatorcontrib><collection>CrossRef</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><jtitle>International journal of speech technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kumar, A Kishore</au><au>Waldekar, Shefali</au><au>Sahidullah, Md</au><au>Saha, Goutam</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust acoustic domain identification with its application to speaker diarization</atitle><jtitle>International journal of speech technology</jtitle><stitle>Int J Speech Technol</stitle><date>2022-12-01</date><risdate>2022</risdate><volume>25</volume><issue>4</issue><spage>933</spage><epage>945</epage><pages>933-945</pages><issn>1381-2416</issn><eissn>1572-8110</eissn><abstract>With the rise in multimedia content over the years, more variety is observed in the recording environments of audio. An audio processing system might benefit when it has a module to identify the acoustic domain at its front-end. In this paper, we demonstrate the idea of acoustic domain identification (ADI) for speaker diarization . For this, we first present a detailed study of the various domains of the third DIHARD challenge highlighting the factors that differentiated them from each other. Our main contribution is to develop a simple and efficient solution for ADI. In the present work, we explore speaker embeddings for this task. Next, we integrate the ADI module with the speaker diarization framework of the DIHARD III challenge. The performance substantially improved over that of the baseline when the thresholds for agglomerative hierarchical clustering were optimized according to the respective domains. We achieved a relative improvement of more than 5 % and 8 % in DER for core and full conditions, respectively, on Track 1 of the DIHARD III evaluation set.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10772-022-09990-9</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-0616-7608</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1381-2416
ispartof International journal of speech technology, 2022-12, Vol.25 (4), p.933-945
issn 1381-2416
1572-8110
language eng
recordid cdi_hal_primary_oai_HAL_hal_03719697v1
source SpringerLink Journals - AutoHoldings
subjects Acoustics
Artificial Intelligence
Cluster analysis
Clustering
Computer Science
Computer Vision and Pattern Recognition
Domains
Engineering
Identification
Modules
Multimedia
Signal and Image Processing
Signal,Image and Speech Processing
Social Sciences
Sound
title Robust acoustic domain identification with its application to speaker diarization
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T07%3A59%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20acoustic%20domain%20identification%20with%20its%20application%20to%20speaker%20diarization&rft.jtitle=International%20journal%20of%20speech%20technology&rft.au=Kumar,%20A%20Kishore&rft.date=2022-12-01&rft.volume=25&rft.issue=4&rft.spage=933&rft.epage=945&rft.pages=933-945&rft.issn=1381-2416&rft.eissn=1572-8110&rft_id=info:doi/10.1007/s10772-022-09990-9&rft_dat=%3Cproquest_hal_p%3E2753690969%3C/proquest_hal_p%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2753690969&rft_id=info:pmid/&rfr_iscdi=true