MULTI-CLASS SPOKEN LANGUAGE DETECTION USING ARTIFICIAL INTELLIGENCE WITH FRACTAL AL-BIRUNI EARTH RADIUS OPTIMIZATION ALGORITHM

Spoken Language Identification (SLID) is the problem of categorizing the language spoken by a speaker in the audio clips. SLID is valuable in multi-language speech recognition systems, personalized voice assistants, and automated speech translation systems in call centers to automatically route call...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Fractals (Singapore) 2024-12, Vol.32 (9n10)
Hauptverfasser: AL-SHATHRY, NAJLA I., ELTAHIR, MAJDY M., ASKLANY, SOMIA A., AL GHAMDI, SAMI A., ALMUHAIMEED, ABDULLAH, ALANAZI, FUHID, MOHAMED, ABDELMONEIM ALI, RIZWANULLAH, MOHAMMED
Format: Artikel
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Spoken Language Identification (SLID) is the problem of categorizing the language spoken by a speaker in the audio clips. SLID is valuable in multi-language speech recognition systems, personalized voice assistants, and automated speech translation systems in call centers to automatically route calls to the language operator. A primary challenge is the language detection from audio with different noise levels and sampling rates, accurately and with a short delay. A further problem is to differentiate between short-duration languages. Previous research works have applied SLID’s lexical, phonetic, phonotactic, and prosodic features. Spoken language detection using deep learning (DL) usually includes training RNN or CNN approaches on audio features such as spectrograms or MFCCs to categorize the language spoken in audio samples. Pioneering methodologies, such as CNN–RNN transformers or hybrids, can capture the spatial and temporal features for better performance. This paper presents a Multi-Class Spoken Language Detection using Artificial Intelligence with Fractal Al-Biruni Earth Radius Optimization (MCSLD-AIBER) technique. The MCSLD-AIBER technique mainly aims to identify the various classes of spoken languages. In the MCSLD-AIBER technique, the Constant- Q Transform (CQT) approach is applied to transform the speech signals. Additionally, the MCSLD-AIBER technique employs Inception with a Residual Network model for the feature extraction process. Moreover, the hyperparameters can be adjusted using the BER approach. A long short-term memory (LSTM) network can be utilized to identify multiple spoken languages. A set of experiments were involved to illustrate the efficient performance of the MCSLD-AIBER technique. The simulation outcomes indicated that the MCSLD-AIBER method performs optimally over other models.
ISSN:0218-348X
1793-6543
DOI:10.1142/S0218348X25400547