A Review on Language-Independent Search on Speech and its Applications

A thorough analysis of language-independent search methods and models for speech detection, a crucial task in retrieving audio file from large archives based on spoken queries was presented in this study. Unlike traditional speech recognition, this "zero-resource task" doesn't require...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2024, Vol.12, p.194182-194202
Hauptverfasser:	Kulkarni, Sushil Venkatesh, Pal, Sukomal
Format:	Artikel
Sprache:	eng
Schlagworte:	Acoustics Audio data Automatic speech recognition Costs cross modal representation Feature extraction Hypothesis testing Imperative sentences Information retrieval Machine learning Multilingual Pattern matching Phonetics Reviews Search methods Speech speech detection techniques Speech recognition Surveys Time series analysis Vectors Voice communication Voice recognition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A thorough analysis of language-independent search methods and models for speech detection, a crucial task in retrieving audio file from large archives based on spoken queries was presented in this study. Unlike traditional speech recognition, this "zero-resource task" doesn't require specific training data or lexical information, relying on hypothesis testing and pattern matching instead. Spoken term detection is the process of searching for large audio databases. Typically, this consists of text-based "spoken term datasets" of specific languages, where sufficient data are available to train automatic speech recognition systems. Speech recognition enables human-machine communication through a variety of voice commands and clear instructions. Telephones and cellular systems are examples of these applications. The study examines modern spoken-term detection systems, highlighting significant advancements and performance improvements. It delves into various speech recognition techniques used in cross-media retrieval systems and machine learning methodologies, emphasizing the practical information retrieval capabilities of cross-modal learning approaches. The research aims to provide an in-depth analysis of methods combining text and image features, addressing topics previously overlooked in surveys. The motivation behind this study stems from the lack of comprehensive reviews on "image and text modalities," ongoing challenges in the "cross-modal retrieval field," and the untapped potential of image and text features in cross-modal retrieval development. By exploring state-of-the-art language-independent searches for speech recognition, this study sheds light on sophisticated applications and paves the way for further advancements in the field.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2024.3520394