A Review on Language-Independent Search on Speech and its Applications
A thorough analysis of language-independent search methods and models for speech detection, a crucial task in retrieving audio file from large archives based on spoken queries was presented in this study. Unlike traditional speech recognition, this "zero-resource task" doesn't require...
Gespeichert in:
Veröffentlicht in: | IEEE access 2024, Vol.12, p.194182-194202 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A thorough analysis of language-independent search methods and models for speech detection, a crucial task in retrieving audio file from large archives based on spoken queries was presented in this study. Unlike traditional speech recognition, this "zero-resource task" doesn't require specific training data or lexical information, relying on hypothesis testing and pattern matching instead. Spoken term detection is the process of searching for large audio databases. Typically, this consists of text-based "spoken term datasets" of specific languages, where sufficient data are available to train automatic speech recognition systems. Speech recognition enables human-machine communication through a variety of voice commands and clear instructions. Telephones and cellular systems are examples of these applications. The study examines modern spoken-term detection systems, highlighting significant advancements and performance improvements. It delves into various speech recognition techniques used in cross-media retrieval systems and machine learning methodologies, emphasizing the practical information retrieval capabilities of cross-modal learning approaches. The research aims to provide an in-depth analysis of methods combining text and image features, addressing topics previously overlooked in surveys. The motivation behind this study stems from the lack of comprehensive reviews on "image and text modalities," ongoing challenges in the "cross-modal retrieval field," and the untapped potential of image and text features in cross-modal retrieval development. By exploring state-of-the-art language-independent searches for speech recognition, this study sheds light on sophisticated applications and paves the way for further advancements in the field. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2024.3520394 |