Robust acoustic domain identification with its application to speaker diarization
With the rise in multimedia content over the years, more variety is observed in the recording environments of audio. An audio processing system might benefit when it has a module to identify the acoustic domain at its front-end. In this paper, we demonstrate the idea of acoustic domain identificatio...
Gespeichert in:
Veröffentlicht in: | International journal of speech technology 2022-12, Vol.25 (4), p.933-945 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 945 |
---|---|
container_issue | 4 |
container_start_page | 933 |
container_title | International journal of speech technology |
container_volume | 25 |
creator | Kumar, A Kishore Waldekar, Shefali Sahidullah, Md Saha, Goutam |
description | With the rise in multimedia content over the years, more variety is observed in the recording environments of audio. An audio processing system might benefit when it has a module to identify the acoustic domain at its front-end. In this paper, we demonstrate the idea of
acoustic domain identification
(ADI) for
speaker diarization
. For this, we first present a detailed study of the various domains of the third DIHARD challenge highlighting the factors that differentiated them from each other. Our main contribution is to develop a simple and efficient solution for ADI. In the present work, we explore speaker embeddings for this task. Next, we integrate the ADI module with the speaker diarization framework of the DIHARD III challenge. The performance substantially improved over that of the baseline when the thresholds for agglomerative hierarchical clustering were optimized according to the respective domains. We achieved a relative improvement of more than
5
%
and
8
%
in DER for core and full conditions, respectively, on Track 1 of the DIHARD III evaluation set. |
doi_str_mv | 10.1007/s10772-022-09990-9 |
format | Article |
fullrecord | <record><control><sourceid>proquest_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_03719697v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2753690969</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2639-72ee6e35d3bff2d7b317392c5b9d5b71abc6f88112a13320ecabca93eb2697783</originalsourceid><addsrcrecordid>eNp9kE9LAzEQxYMoWKtfwFPAk4fVTOJuNsciaoWCKHoOSTZrU9vNmqSKfnrTrn9uHoYZHr_3GB5Cx0DOgBB-HoFwTgtC8wghSCF20AjKLNUAZDffrIaCXkC1jw5iXBBCBBd0hO4fvF7HhJXxeTmDG79SrsOusV1yrTMqOd_hd5fm2KWIVd8vf8TkceyterEBN04F97mVD9Feq5bRHn3vMXq6vnq8nBazu5vby8msMLRiouDU2sqysmG6bWnDNQPOBDWlFk2pOShtqrbO31MFjFFiTVaUYFbTSnBeszE6HXLnain74FYqfEivnJxOZnKjEcZBZPYNMnsysH3wr2sbk1z4dejye5LyklWCZDBTdKBM8DEG2_7GApGbmuVQs8w1y23NcmNigylmuHu24S_6H9cX_U6ALA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2753690969</pqid></control><display><type>article</type><title>Robust acoustic domain identification with its application to speaker diarization</title><source>SpringerLink Journals - AutoHoldings</source><creator>Kumar, A Kishore ; Waldekar, Shefali ; Sahidullah, Md ; Saha, Goutam</creator><creatorcontrib>Kumar, A Kishore ; Waldekar, Shefali ; Sahidullah, Md ; Saha, Goutam</creatorcontrib><description>With the rise in multimedia content over the years, more variety is observed in the recording environments of audio. An audio processing system might benefit when it has a module to identify the acoustic domain at its front-end. In this paper, we demonstrate the idea of
acoustic domain identification
(ADI) for
speaker diarization
. For this, we first present a detailed study of the various domains of the third DIHARD challenge highlighting the factors that differentiated them from each other. Our main contribution is to develop a simple and efficient solution for ADI. In the present work, we explore speaker embeddings for this task. Next, we integrate the ADI module with the speaker diarization framework of the DIHARD III challenge. The performance substantially improved over that of the baseline when the thresholds for agglomerative hierarchical clustering were optimized according to the respective domains. We achieved a relative improvement of more than
5
%
and
8
%
in DER for core and full conditions, respectively, on Track 1 of the DIHARD III evaluation set.</description><identifier>ISSN: 1381-2416</identifier><identifier>EISSN: 1572-8110</identifier><identifier>DOI: 10.1007/s10772-022-09990-9</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Acoustics ; Artificial Intelligence ; Cluster analysis ; Clustering ; Computer Science ; Computer Vision and Pattern Recognition ; Domains ; Engineering ; Identification ; Modules ; Multimedia ; Signal and Image Processing ; Signal,Image and Speech Processing ; Social Sciences ; Sound</subject><ispartof>International journal of speech technology, 2022-12, Vol.25 (4), p.933-945</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022</rights><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022.</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c2639-72ee6e35d3bff2d7b317392c5b9d5b71abc6f88112a13320ecabca93eb2697783</cites><orcidid>0000-0002-0616-7608</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10772-022-09990-9$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10772-022-09990-9$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>230,314,780,784,885,27924,27925,41488,42557,51319</link.rule.ids><backlink>$$Uhttps://inria.hal.science/hal-03719697$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Kumar, A Kishore</creatorcontrib><creatorcontrib>Waldekar, Shefali</creatorcontrib><creatorcontrib>Sahidullah, Md</creatorcontrib><creatorcontrib>Saha, Goutam</creatorcontrib><title>Robust acoustic domain identification with its application to speaker diarization</title><title>International journal of speech technology</title><addtitle>Int J Speech Technol</addtitle><description>With the rise in multimedia content over the years, more variety is observed in the recording environments of audio. An audio processing system might benefit when it has a module to identify the acoustic domain at its front-end. In this paper, we demonstrate the idea of
acoustic domain identification
(ADI) for
speaker diarization
. For this, we first present a detailed study of the various domains of the third DIHARD challenge highlighting the factors that differentiated them from each other. Our main contribution is to develop a simple and efficient solution for ADI. In the present work, we explore speaker embeddings for this task. Next, we integrate the ADI module with the speaker diarization framework of the DIHARD III challenge. The performance substantially improved over that of the baseline when the thresholds for agglomerative hierarchical clustering were optimized according to the respective domains. We achieved a relative improvement of more than
5
%
and
8
%
in DER for core and full conditions, respectively, on Track 1 of the DIHARD III evaluation set.</description><subject>Acoustics</subject><subject>Artificial Intelligence</subject><subject>Cluster analysis</subject><subject>Clustering</subject><subject>Computer Science</subject><subject>Computer Vision and Pattern Recognition</subject><subject>Domains</subject><subject>Engineering</subject><subject>Identification</subject><subject>Modules</subject><subject>Multimedia</subject><subject>Signal and Image Processing</subject><subject>Signal,Image and Speech Processing</subject><subject>Social Sciences</subject><subject>Sound</subject><issn>1381-2416</issn><issn>1572-8110</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kE9LAzEQxYMoWKtfwFPAk4fVTOJuNsciaoWCKHoOSTZrU9vNmqSKfnrTrn9uHoYZHr_3GB5Cx0DOgBB-HoFwTgtC8wghSCF20AjKLNUAZDffrIaCXkC1jw5iXBBCBBd0hO4fvF7HhJXxeTmDG79SrsOusV1yrTMqOd_hd5fm2KWIVd8vf8TkceyterEBN04F97mVD9Feq5bRHn3vMXq6vnq8nBazu5vby8msMLRiouDU2sqysmG6bWnDNQPOBDWlFk2pOShtqrbO31MFjFFiTVaUYFbTSnBeszE6HXLnain74FYqfEivnJxOZnKjEcZBZPYNMnsysH3wr2sbk1z4dejye5LyklWCZDBTdKBM8DEG2_7GApGbmuVQs8w1y23NcmNigylmuHu24S_6H9cX_U6ALA</recordid><startdate>20221201</startdate><enddate>20221201</enddate><creator>Kumar, A Kishore</creator><creator>Waldekar, Shefali</creator><creator>Sahidullah, Md</creator><creator>Saha, Goutam</creator><general>Springer US</general><general>Springer Nature B.V</general><general>Springer Verlag</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7T9</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0002-0616-7608</orcidid></search><sort><creationdate>20221201</creationdate><title>Robust acoustic domain identification with its application to speaker diarization</title><author>Kumar, A Kishore ; Waldekar, Shefali ; Sahidullah, Md ; Saha, Goutam</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2639-72ee6e35d3bff2d7b317392c5b9d5b71abc6f88112a13320ecabca93eb2697783</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Acoustics</topic><topic>Artificial Intelligence</topic><topic>Cluster analysis</topic><topic>Clustering</topic><topic>Computer Science</topic><topic>Computer Vision and Pattern Recognition</topic><topic>Domains</topic><topic>Engineering</topic><topic>Identification</topic><topic>Modules</topic><topic>Multimedia</topic><topic>Signal and Image Processing</topic><topic>Signal,Image and Speech Processing</topic><topic>Social Sciences</topic><topic>Sound</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kumar, A Kishore</creatorcontrib><creatorcontrib>Waldekar, Shefali</creatorcontrib><creatorcontrib>Sahidullah, Md</creatorcontrib><creatorcontrib>Saha, Goutam</creatorcontrib><collection>CrossRef</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><jtitle>International journal of speech technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kumar, A Kishore</au><au>Waldekar, Shefali</au><au>Sahidullah, Md</au><au>Saha, Goutam</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust acoustic domain identification with its application to speaker diarization</atitle><jtitle>International journal of speech technology</jtitle><stitle>Int J Speech Technol</stitle><date>2022-12-01</date><risdate>2022</risdate><volume>25</volume><issue>4</issue><spage>933</spage><epage>945</epage><pages>933-945</pages><issn>1381-2416</issn><eissn>1572-8110</eissn><abstract>With the rise in multimedia content over the years, more variety is observed in the recording environments of audio. An audio processing system might benefit when it has a module to identify the acoustic domain at its front-end. In this paper, we demonstrate the idea of
acoustic domain identification
(ADI) for
speaker diarization
. For this, we first present a detailed study of the various domains of the third DIHARD challenge highlighting the factors that differentiated them from each other. Our main contribution is to develop a simple and efficient solution for ADI. In the present work, we explore speaker embeddings for this task. Next, we integrate the ADI module with the speaker diarization framework of the DIHARD III challenge. The performance substantially improved over that of the baseline when the thresholds for agglomerative hierarchical clustering were optimized according to the respective domains. We achieved a relative improvement of more than
5
%
and
8
%
in DER for core and full conditions, respectively, on Track 1 of the DIHARD III evaluation set.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10772-022-09990-9</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-0616-7608</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1381-2416 |
ispartof | International journal of speech technology, 2022-12, Vol.25 (4), p.933-945 |
issn | 1381-2416 1572-8110 |
language | eng |
recordid | cdi_hal_primary_oai_HAL_hal_03719697v1 |
source | SpringerLink Journals - AutoHoldings |
subjects | Acoustics Artificial Intelligence Cluster analysis Clustering Computer Science Computer Vision and Pattern Recognition Domains Engineering Identification Modules Multimedia Signal and Image Processing Signal,Image and Speech Processing Social Sciences Sound |
title | Robust acoustic domain identification with its application to speaker diarization |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T07%3A59%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20acoustic%20domain%20identification%20with%20its%20application%20to%20speaker%20diarization&rft.jtitle=International%20journal%20of%20speech%20technology&rft.au=Kumar,%20A%20Kishore&rft.date=2022-12-01&rft.volume=25&rft.issue=4&rft.spage=933&rft.epage=945&rft.pages=933-945&rft.issn=1381-2416&rft.eissn=1572-8110&rft_id=info:doi/10.1007/s10772-022-09990-9&rft_dat=%3Cproquest_hal_p%3E2753690969%3C/proquest_hal_p%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2753690969&rft_id=info:pmid/&rfr_iscdi=true |