A novel approach to perform context‐based automatic spoken document retrieval of political speeches based on wavelet tree indexing

Spoken document retrieval for a specific context is a very trending and interesting area of research. It makes it convenient for users to search through archives of speech data, which is not possible manually as it is very time consuming and expensive. In the current article, we focus on performing...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia tools and applications 2021-06, Vol.80 (14), p.22209-22229
Hauptverfasser: Gupta, Anishka, Yadav, Divakar
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 22229
container_issue 14
container_start_page 22209
container_title Multimedia tools and applications
container_volume 80
creator Gupta, Anishka
Yadav, Divakar
description Spoken document retrieval for a specific context is a very trending and interesting area of research. It makes it convenient for users to search through archives of speech data, which is not possible manually as it is very time consuming and expensive. In the current article, we focus on performing the same for political speeches, delivered in a variety of environments. The technique used here takes an archive of spoken documents (audio files) as input and performs automatic speech recognition (ASR) on it to derive the textual transcripts, using deep neural networks (DNN), hidden markov models (HMM) and Gaussian mixture models (GMM). These transcriptions are further pruned for indexing by applying certain pre-processing techniques. Thereafter, it builds time and space efficient index of the documents using wavelet trees for its retrieval. The constructed index is searched through to find the count of occurrences of the words in the query, fired by the users. These counts are then utilized to calculate the term frequency - inverse document frequency (TF-IDF) scores, and then the similarity score of the query with each document is calculated using cosine similarity method. Finally, the documents are ranked based on these scores in the order of relevance. Therefore, the proposed system develops a speech recognition system and introduces a novel indexing scheme, based on wavelet trees for retrieving data.
doi_str_mv 10.1007/s11042-021-10800-8
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2531845477</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2531845477</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-d11f75198d8cf9370cc7315ef874c53dc3147faab09bc48b0c8460f17730d3a83</originalsourceid><addsrcrecordid>eNp9kD1uGzEQhYkgBuLIvoArAq7XmVnuilRpCP4DBKRJaoLiDqVVJHJDUrLdpUmfI_gsPkpOEtobwJ2rmQG-9x7mMXaGcIEA8ktChKauoMYKQQFU6gM7xlaKSsoaP5ZdKKhkC_iJfU5pA4DTtm6O2e9L7sOBttwMQwzGrnkOfKDoQtxxG3ymh_z315-lSdRxs89hZ3JveRrCD_K8C3a_I595pBx7OpgtD44PYdsXqBxpILJrSnzUB8_vTQmjzHMken7qfUcPvV-dsCNntolO_88J-3599W1-Wy2-3tzNLxeVFTjLVYfoZIsz1SnrZkKCtVJgS07JxraiK1QjnTFLmC1to5ZgVTMFh1IK6IRRYsLOR9_y6889paw3YR99idR1K1A1bVPYCatHysaQUiSnh9jvTHzUCPqlbj3WrUvd-rVu_WItRlEqsF9RfLN-R_UPAiOG8Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2531845477</pqid></control><display><type>article</type><title>A novel approach to perform context‐based automatic spoken document retrieval of political speeches based on wavelet tree indexing</title><source>SpringerLink Journals - AutoHoldings</source><creator>Gupta, Anishka ; Yadav, Divakar</creator><creatorcontrib>Gupta, Anishka ; Yadav, Divakar</creatorcontrib><description>Spoken document retrieval for a specific context is a very trending and interesting area of research. It makes it convenient for users to search through archives of speech data, which is not possible manually as it is very time consuming and expensive. In the current article, we focus on performing the same for political speeches, delivered in a variety of environments. The technique used here takes an archive of spoken documents (audio files) as input and performs automatic speech recognition (ASR) on it to derive the textual transcripts, using deep neural networks (DNN), hidden markov models (HMM) and Gaussian mixture models (GMM). These transcriptions are further pruned for indexing by applying certain pre-processing techniques. Thereafter, it builds time and space efficient index of the documents using wavelet trees for its retrieval. The constructed index is searched through to find the count of occurrences of the words in the query, fired by the users. These counts are then utilized to calculate the term frequency - inverse document frequency (TF-IDF) scores, and then the similarity score of the query with each document is calculated using cosine similarity method. Finally, the documents are ranked based on these scores in the order of relevance. Therefore, the proposed system develops a speech recognition system and introduces a novel indexing scheme, based on wavelet trees for retrieving data.</description><identifier>ISSN: 1380-7501</identifier><identifier>EISSN: 1573-7721</identifier><identifier>DOI: 10.1007/s11042-021-10800-8</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Archives &amp; records ; Artificial neural networks ; Audio data ; Automatic speech recognition ; Computer Communication Networks ; Computer Science ; Context ; Data Structures and Information Theory ; Deep learning ; Documents ; Indexing ; Markov analysis ; Markov chains ; Multimedia Information Systems ; Neural networks ; Probabilistic models ; Retrieval ; Similarity ; Special Purpose and Application-Based Systems ; Speech recognition ; Speeches ; Voice recognition</subject><ispartof>Multimedia tools and applications, 2021-06, Vol.80 (14), p.22209-22229</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021</rights><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-d11f75198d8cf9370cc7315ef874c53dc3147faab09bc48b0c8460f17730d3a83</citedby><cites>FETCH-LOGICAL-c319t-d11f75198d8cf9370cc7315ef874c53dc3147faab09bc48b0c8460f17730d3a83</cites><orcidid>0000-0001-6051-479X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11042-021-10800-8$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11042-021-10800-8$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Gupta, Anishka</creatorcontrib><creatorcontrib>Yadav, Divakar</creatorcontrib><title>A novel approach to perform context‐based automatic spoken document retrieval of political speeches based on wavelet tree indexing</title><title>Multimedia tools and applications</title><addtitle>Multimed Tools Appl</addtitle><description>Spoken document retrieval for a specific context is a very trending and interesting area of research. It makes it convenient for users to search through archives of speech data, which is not possible manually as it is very time consuming and expensive. In the current article, we focus on performing the same for political speeches, delivered in a variety of environments. The technique used here takes an archive of spoken documents (audio files) as input and performs automatic speech recognition (ASR) on it to derive the textual transcripts, using deep neural networks (DNN), hidden markov models (HMM) and Gaussian mixture models (GMM). These transcriptions are further pruned for indexing by applying certain pre-processing techniques. Thereafter, it builds time and space efficient index of the documents using wavelet trees for its retrieval. The constructed index is searched through to find the count of occurrences of the words in the query, fired by the users. These counts are then utilized to calculate the term frequency - inverse document frequency (TF-IDF) scores, and then the similarity score of the query with each document is calculated using cosine similarity method. Finally, the documents are ranked based on these scores in the order of relevance. Therefore, the proposed system develops a speech recognition system and introduces a novel indexing scheme, based on wavelet trees for retrieving data.</description><subject>Archives &amp; records</subject><subject>Artificial neural networks</subject><subject>Audio data</subject><subject>Automatic speech recognition</subject><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Context</subject><subject>Data Structures and Information Theory</subject><subject>Deep learning</subject><subject>Documents</subject><subject>Indexing</subject><subject>Markov analysis</subject><subject>Markov chains</subject><subject>Multimedia Information Systems</subject><subject>Neural networks</subject><subject>Probabilistic models</subject><subject>Retrieval</subject><subject>Similarity</subject><subject>Special Purpose and Application-Based Systems</subject><subject>Speech recognition</subject><subject>Speeches</subject><subject>Voice recognition</subject><issn>1380-7501</issn><issn>1573-7721</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>BENPR</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNp9kD1uGzEQhYkgBuLIvoArAq7XmVnuilRpCP4DBKRJaoLiDqVVJHJDUrLdpUmfI_gsPkpOEtobwJ2rmQG-9x7mMXaGcIEA8ktChKauoMYKQQFU6gM7xlaKSsoaP5ZdKKhkC_iJfU5pA4DTtm6O2e9L7sOBttwMQwzGrnkOfKDoQtxxG3ymh_z315-lSdRxs89hZ3JveRrCD_K8C3a_I595pBx7OpgtD44PYdsXqBxpILJrSnzUB8_vTQmjzHMken7qfUcPvV-dsCNntolO_88J-3599W1-Wy2-3tzNLxeVFTjLVYfoZIsz1SnrZkKCtVJgS07JxraiK1QjnTFLmC1to5ZgVTMFh1IK6IRRYsLOR9_y6889paw3YR99idR1K1A1bVPYCatHysaQUiSnh9jvTHzUCPqlbj3WrUvd-rVu_WItRlEqsF9RfLN-R_UPAiOG8Q</recordid><startdate>20210601</startdate><enddate>20210601</enddate><creator>Gupta, Anishka</creator><creator>Yadav, Divakar</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7T9</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0001-6051-479X</orcidid></search><sort><creationdate>20210601</creationdate><title>A novel approach to perform context‐based automatic spoken document retrieval of political speeches based on wavelet tree indexing</title><author>Gupta, Anishka ; Yadav, Divakar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-d11f75198d8cf9370cc7315ef874c53dc3147faab09bc48b0c8460f17730d3a83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Archives &amp; records</topic><topic>Artificial neural networks</topic><topic>Audio data</topic><topic>Automatic speech recognition</topic><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Context</topic><topic>Data Structures and Information Theory</topic><topic>Deep learning</topic><topic>Documents</topic><topic>Indexing</topic><topic>Markov analysis</topic><topic>Markov chains</topic><topic>Multimedia Information Systems</topic><topic>Neural networks</topic><topic>Probabilistic models</topic><topic>Retrieval</topic><topic>Similarity</topic><topic>Special Purpose and Application-Based Systems</topic><topic>Speech recognition</topic><topic>Speeches</topic><topic>Voice recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gupta, Anishka</creatorcontrib><creatorcontrib>Yadav, Divakar</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Database‎ (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM global</collection><collection>Computing Database</collection><collection>ProQuest Research Library</collection><collection>Research Library (Corporate)</collection><collection>ProQuest advanced technologies &amp; aerospace journals</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>One Business (ProQuest)</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Multimedia tools and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Gupta, Anishka</au><au>Yadav, Divakar</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A novel approach to perform context‐based automatic spoken document retrieval of political speeches based on wavelet tree indexing</atitle><jtitle>Multimedia tools and applications</jtitle><stitle>Multimed Tools Appl</stitle><date>2021-06-01</date><risdate>2021</risdate><volume>80</volume><issue>14</issue><spage>22209</spage><epage>22229</epage><pages>22209-22229</pages><issn>1380-7501</issn><eissn>1573-7721</eissn><abstract>Spoken document retrieval for a specific context is a very trending and interesting area of research. It makes it convenient for users to search through archives of speech data, which is not possible manually as it is very time consuming and expensive. In the current article, we focus on performing the same for political speeches, delivered in a variety of environments. The technique used here takes an archive of spoken documents (audio files) as input and performs automatic speech recognition (ASR) on it to derive the textual transcripts, using deep neural networks (DNN), hidden markov models (HMM) and Gaussian mixture models (GMM). These transcriptions are further pruned for indexing by applying certain pre-processing techniques. Thereafter, it builds time and space efficient index of the documents using wavelet trees for its retrieval. The constructed index is searched through to find the count of occurrences of the words in the query, fired by the users. These counts are then utilized to calculate the term frequency - inverse document frequency (TF-IDF) scores, and then the similarity score of the query with each document is calculated using cosine similarity method. Finally, the documents are ranked based on these scores in the order of relevance. Therefore, the proposed system develops a speech recognition system and introduces a novel indexing scheme, based on wavelet trees for retrieving data.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11042-021-10800-8</doi><tpages>21</tpages><orcidid>https://orcid.org/0000-0001-6051-479X</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1380-7501
ispartof Multimedia tools and applications, 2021-06, Vol.80 (14), p.22209-22229
issn 1380-7501
1573-7721
language eng
recordid cdi_proquest_journals_2531845477
source SpringerLink Journals - AutoHoldings
subjects Archives & records
Artificial neural networks
Audio data
Automatic speech recognition
Computer Communication Networks
Computer Science
Context
Data Structures and Information Theory
Deep learning
Documents
Indexing
Markov analysis
Markov chains
Multimedia Information Systems
Neural networks
Probabilistic models
Retrieval
Similarity
Special Purpose and Application-Based Systems
Speech recognition
Speeches
Voice recognition
title A novel approach to perform context‐based automatic spoken document retrieval of political speeches based on wavelet tree indexing
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T19%3A58%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20novel%20approach%20to%20perform%20context%E2%80%90based%20automatic%20spoken%20document%20retrieval%20of%20political%20speeches%20based%20on%20wavelet%20tree%C2%A0indexing&rft.jtitle=Multimedia%20tools%20and%20applications&rft.au=Gupta,%20Anishka&rft.date=2021-06-01&rft.volume=80&rft.issue=14&rft.spage=22209&rft.epage=22229&rft.pages=22209-22229&rft.issn=1380-7501&rft.eissn=1573-7721&rft_id=info:doi/10.1007/s11042-021-10800-8&rft_dat=%3Cproquest_cross%3E2531845477%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2531845477&rft_id=info:pmid/&rfr_iscdi=true