A novel approach to perform context‐based automatic spoken document retrieval of political speeches based on wavelet tree indexing
Spoken document retrieval for a specific context is a very trending and interesting area of research. It makes it convenient for users to search through archives of speech data, which is not possible manually as it is very time consuming and expensive. In the current article, we focus on performing...
Gespeichert in:
Veröffentlicht in: | Multimedia tools and applications 2021-06, Vol.80 (14), p.22209-22229 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 22229 |
---|---|
container_issue | 14 |
container_start_page | 22209 |
container_title | Multimedia tools and applications |
container_volume | 80 |
creator | Gupta, Anishka Yadav, Divakar |
description | Spoken document retrieval for a specific context is a very trending and interesting area of research. It makes it convenient for users to search through archives of speech data, which is not possible manually as it is very time consuming and expensive. In the current article, we focus on performing the same for political speeches, delivered in a variety of environments. The technique used here takes an archive of spoken documents (audio files) as input and performs automatic speech recognition (ASR) on it to derive the textual transcripts, using deep neural networks (DNN), hidden markov models (HMM) and Gaussian mixture models (GMM). These transcriptions are further pruned for indexing by applying certain pre-processing techniques. Thereafter, it builds time and space efficient index of the documents using wavelet trees for its retrieval. The constructed index is searched through to find the count of occurrences of the words in the query, fired by the users. These counts are then utilized to calculate the term frequency - inverse document frequency (TF-IDF) scores, and then the similarity score of the query with each document is calculated using cosine similarity method. Finally, the documents are ranked based on these scores in the order of relevance. Therefore, the proposed system develops a speech recognition system and introduces a novel indexing scheme, based on wavelet trees for retrieving data. |
doi_str_mv | 10.1007/s11042-021-10800-8 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2531845477</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2531845477</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-d11f75198d8cf9370cc7315ef874c53dc3147faab09bc48b0c8460f17730d3a83</originalsourceid><addsrcrecordid>eNp9kD1uGzEQhYkgBuLIvoArAq7XmVnuilRpCP4DBKRJaoLiDqVVJHJDUrLdpUmfI_gsPkpOEtobwJ2rmQG-9x7mMXaGcIEA8ktChKauoMYKQQFU6gM7xlaKSsoaP5ZdKKhkC_iJfU5pA4DTtm6O2e9L7sOBttwMQwzGrnkOfKDoQtxxG3ymh_z315-lSdRxs89hZ3JveRrCD_K8C3a_I595pBx7OpgtD44PYdsXqBxpILJrSnzUB8_vTQmjzHMken7qfUcPvV-dsCNntolO_88J-3599W1-Wy2-3tzNLxeVFTjLVYfoZIsz1SnrZkKCtVJgS07JxraiK1QjnTFLmC1to5ZgVTMFh1IK6IRRYsLOR9_y6889paw3YR99idR1K1A1bVPYCatHysaQUiSnh9jvTHzUCPqlbj3WrUvd-rVu_WItRlEqsF9RfLN-R_UPAiOG8Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2531845477</pqid></control><display><type>article</type><title>A novel approach to perform context‐based automatic spoken document retrieval of political speeches based on wavelet tree indexing</title><source>SpringerLink Journals - AutoHoldings</source><creator>Gupta, Anishka ; Yadav, Divakar</creator><creatorcontrib>Gupta, Anishka ; Yadav, Divakar</creatorcontrib><description>Spoken document retrieval for a specific context is a very trending and interesting area of research. It makes it convenient for users to search through archives of speech data, which is not possible manually as it is very time consuming and expensive. In the current article, we focus on performing the same for political speeches, delivered in a variety of environments. The technique used here takes an archive of spoken documents (audio files) as input and performs automatic speech recognition (ASR) on it to derive the textual transcripts, using deep neural networks (DNN), hidden markov models (HMM) and Gaussian mixture models (GMM). These transcriptions are further pruned for indexing by applying certain pre-processing techniques. Thereafter, it builds time and space efficient index of the documents using wavelet trees for its retrieval. The constructed index is searched through to find the count of occurrences of the words in the query, fired by the users. These counts are then utilized to calculate the term frequency - inverse document frequency (TF-IDF) scores, and then the similarity score of the query with each document is calculated using cosine similarity method. Finally, the documents are ranked based on these scores in the order of relevance. Therefore, the proposed system develops a speech recognition system and introduces a novel indexing scheme, based on wavelet trees for retrieving data.</description><identifier>ISSN: 1380-7501</identifier><identifier>EISSN: 1573-7721</identifier><identifier>DOI: 10.1007/s11042-021-10800-8</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Archives & records ; Artificial neural networks ; Audio data ; Automatic speech recognition ; Computer Communication Networks ; Computer Science ; Context ; Data Structures and Information Theory ; Deep learning ; Documents ; Indexing ; Markov analysis ; Markov chains ; Multimedia Information Systems ; Neural networks ; Probabilistic models ; Retrieval ; Similarity ; Special Purpose and Application-Based Systems ; Speech recognition ; Speeches ; Voice recognition</subject><ispartof>Multimedia tools and applications, 2021-06, Vol.80 (14), p.22209-22229</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021</rights><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-d11f75198d8cf9370cc7315ef874c53dc3147faab09bc48b0c8460f17730d3a83</citedby><cites>FETCH-LOGICAL-c319t-d11f75198d8cf9370cc7315ef874c53dc3147faab09bc48b0c8460f17730d3a83</cites><orcidid>0000-0001-6051-479X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11042-021-10800-8$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11042-021-10800-8$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Gupta, Anishka</creatorcontrib><creatorcontrib>Yadav, Divakar</creatorcontrib><title>A novel approach to perform context‐based automatic spoken document retrieval of political speeches based on wavelet tree indexing</title><title>Multimedia tools and applications</title><addtitle>Multimed Tools Appl</addtitle><description>Spoken document retrieval for a specific context is a very trending and interesting area of research. It makes it convenient for users to search through archives of speech data, which is not possible manually as it is very time consuming and expensive. In the current article, we focus on performing the same for political speeches, delivered in a variety of environments. The technique used here takes an archive of spoken documents (audio files) as input and performs automatic speech recognition (ASR) on it to derive the textual transcripts, using deep neural networks (DNN), hidden markov models (HMM) and Gaussian mixture models (GMM). These transcriptions are further pruned for indexing by applying certain pre-processing techniques. Thereafter, it builds time and space efficient index of the documents using wavelet trees for its retrieval. The constructed index is searched through to find the count of occurrences of the words in the query, fired by the users. These counts are then utilized to calculate the term frequency - inverse document frequency (TF-IDF) scores, and then the similarity score of the query with each document is calculated using cosine similarity method. Finally, the documents are ranked based on these scores in the order of relevance. Therefore, the proposed system develops a speech recognition system and introduces a novel indexing scheme, based on wavelet trees for retrieving data.</description><subject>Archives & records</subject><subject>Artificial neural networks</subject><subject>Audio data</subject><subject>Automatic speech recognition</subject><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Context</subject><subject>Data Structures and Information Theory</subject><subject>Deep learning</subject><subject>Documents</subject><subject>Indexing</subject><subject>Markov analysis</subject><subject>Markov chains</subject><subject>Multimedia Information Systems</subject><subject>Neural networks</subject><subject>Probabilistic models</subject><subject>Retrieval</subject><subject>Similarity</subject><subject>Special Purpose and Application-Based Systems</subject><subject>Speech recognition</subject><subject>Speeches</subject><subject>Voice recognition</subject><issn>1380-7501</issn><issn>1573-7721</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>BENPR</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNp9kD1uGzEQhYkgBuLIvoArAq7XmVnuilRpCP4DBKRJaoLiDqVVJHJDUrLdpUmfI_gsPkpOEtobwJ2rmQG-9x7mMXaGcIEA8ktChKauoMYKQQFU6gM7xlaKSsoaP5ZdKKhkC_iJfU5pA4DTtm6O2e9L7sOBttwMQwzGrnkOfKDoQtxxG3ymh_z315-lSdRxs89hZ3JveRrCD_K8C3a_I595pBx7OpgtD44PYdsXqBxpILJrSnzUB8_vTQmjzHMken7qfUcPvV-dsCNntolO_88J-3599W1-Wy2-3tzNLxeVFTjLVYfoZIsz1SnrZkKCtVJgS07JxraiK1QjnTFLmC1to5ZgVTMFh1IK6IRRYsLOR9_y6889paw3YR99idR1K1A1bVPYCatHysaQUiSnh9jvTHzUCPqlbj3WrUvd-rVu_WItRlEqsF9RfLN-R_UPAiOG8Q</recordid><startdate>20210601</startdate><enddate>20210601</enddate><creator>Gupta, Anishka</creator><creator>Yadav, Divakar</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7T9</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0001-6051-479X</orcidid></search><sort><creationdate>20210601</creationdate><title>A novel approach to perform context‐based automatic spoken document retrieval of political speeches based on wavelet tree indexing</title><author>Gupta, Anishka ; Yadav, Divakar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-d11f75198d8cf9370cc7315ef874c53dc3147faab09bc48b0c8460f17730d3a83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Archives & records</topic><topic>Artificial neural networks</topic><topic>Audio data</topic><topic>Automatic speech recognition</topic><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Context</topic><topic>Data Structures and Information Theory</topic><topic>Deep learning</topic><topic>Documents</topic><topic>Indexing</topic><topic>Markov analysis</topic><topic>Markov chains</topic><topic>Multimedia Information Systems</topic><topic>Neural networks</topic><topic>Probabilistic models</topic><topic>Retrieval</topic><topic>Similarity</topic><topic>Special Purpose and Application-Based Systems</topic><topic>Speech recognition</topic><topic>Speeches</topic><topic>Voice recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gupta, Anishka</creatorcontrib><creatorcontrib>Yadav, Divakar</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Database (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM global</collection><collection>Computing Database</collection><collection>ProQuest Research Library</collection><collection>Research Library (Corporate)</collection><collection>ProQuest advanced technologies & aerospace journals</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>One Business (ProQuest)</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Multimedia tools and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Gupta, Anishka</au><au>Yadav, Divakar</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A novel approach to perform context‐based automatic spoken document retrieval of political speeches based on wavelet tree indexing</atitle><jtitle>Multimedia tools and applications</jtitle><stitle>Multimed Tools Appl</stitle><date>2021-06-01</date><risdate>2021</risdate><volume>80</volume><issue>14</issue><spage>22209</spage><epage>22229</epage><pages>22209-22229</pages><issn>1380-7501</issn><eissn>1573-7721</eissn><abstract>Spoken document retrieval for a specific context is a very trending and interesting area of research. It makes it convenient for users to search through archives of speech data, which is not possible manually as it is very time consuming and expensive. In the current article, we focus on performing the same for political speeches, delivered in a variety of environments. The technique used here takes an archive of spoken documents (audio files) as input and performs automatic speech recognition (ASR) on it to derive the textual transcripts, using deep neural networks (DNN), hidden markov models (HMM) and Gaussian mixture models (GMM). These transcriptions are further pruned for indexing by applying certain pre-processing techniques. Thereafter, it builds time and space efficient index of the documents using wavelet trees for its retrieval. The constructed index is searched through to find the count of occurrences of the words in the query, fired by the users. These counts are then utilized to calculate the term frequency - inverse document frequency (TF-IDF) scores, and then the similarity score of the query with each document is calculated using cosine similarity method. Finally, the documents are ranked based on these scores in the order of relevance. Therefore, the proposed system develops a speech recognition system and introduces a novel indexing scheme, based on wavelet trees for retrieving data.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11042-021-10800-8</doi><tpages>21</tpages><orcidid>https://orcid.org/0000-0001-6051-479X</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1380-7501 |
ispartof | Multimedia tools and applications, 2021-06, Vol.80 (14), p.22209-22229 |
issn | 1380-7501 1573-7721 |
language | eng |
recordid | cdi_proquest_journals_2531845477 |
source | SpringerLink Journals - AutoHoldings |
subjects | Archives & records Artificial neural networks Audio data Automatic speech recognition Computer Communication Networks Computer Science Context Data Structures and Information Theory Deep learning Documents Indexing Markov analysis Markov chains Multimedia Information Systems Neural networks Probabilistic models Retrieval Similarity Special Purpose and Application-Based Systems Speech recognition Speeches Voice recognition |
title | A novel approach to perform context‐based automatic spoken document retrieval of political speeches based on wavelet tree indexing |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T19%3A58%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20novel%20approach%20to%20perform%20context%E2%80%90based%20automatic%20spoken%20document%20retrieval%20of%20political%20speeches%20based%20on%20wavelet%20tree%C2%A0indexing&rft.jtitle=Multimedia%20tools%20and%20applications&rft.au=Gupta,%20Anishka&rft.date=2021-06-01&rft.volume=80&rft.issue=14&rft.spage=22209&rft.epage=22229&rft.pages=22209-22229&rft.issn=1380-7501&rft.eissn=1573-7721&rft_id=info:doi/10.1007/s11042-021-10800-8&rft_dat=%3Cproquest_cross%3E2531845477%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2531845477&rft_id=info:pmid/&rfr_iscdi=true |