Transfer learning from adult to children for speech recognition: Evaluation, analysis and recommendations

•In this work, we conduct Evaluations on large vocabulary continuous speech recognition (LVCSR) for children, to:•Compare older GMM-HMM models and newer DNN models.•Investigate different transfer learning adaptation techniques.•Assess effectiveness of different speaker normalization and adaptation t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer speech & language 2020-09, Vol.63, p.101077, Article 101077
Hauptverfasser: Gurunath Shivakumar, Prashanth, Georgiou, Panayiotis
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•In this work, we conduct Evaluations on large vocabulary continuous speech recognition (LVCSR) for children, to:•Compare older GMM-HMM models and newer DNN models.•Investigate different transfer learning adaptation techniques.•Assess effectiveness of different speaker normalization and adaptation techniques like VTLN, fMLLR, i-vector based adaptation versus the employed transfer learning technique.•Further, we conduct Analysis over the following parameters in context of transfer learning:•DNN model parameters.•Amount of adaptation data.•Effect of children's ages.•Age dependent transformations obtained from transfer learning and their validity, portability over the children's age span.•We finally provide Recommendations on:•Favorable transfer learning adaptation strategies for low data and high data scenarios.•Suggested transfer learning adaptation techniques for children of different ages.•Amount of adaptation data required for efficient performance over children's ages.•Potential future research directions and relevant challenges and problems persisting in children speech recognition. Children speech recognition is challenging mainly due to the inherent high variability in children’s physical and articulatory characteristics and expressions. This variability manifests in both acoustic constructs and linguistic usage due to the rapidly changing developmental stage in children’s life. Part of the challenge is due to the lack of large amounts of available children speech data for efficient modeling. This work attempts to address the key challenges using transfer learning from adult’s models to children’s models in a Deep Neural Network (DNN) framework for children’s Automatic Speech Recognition (ASR) task evaluating on multiple children’s speech corpora with a large vocabulary. The paper presents a systematic and an extensive analysis of the proposed transfer learning technique considering the key factors affecting children’s speech recognition from prior literature. Evaluations are presented on (i) comparisons of earlier GMM-HMM and the newer DNN Models, (ii) effectiveness of standard adaptation techniques versus transfer learning, (iii) various adaptation configurations in tackling the variabilities present in children speech, in terms of (a) acoustic spectral variability, and (b) pronunciation variability and linguistic constraints. Our Analysis spans over (i) number of DNN model parameters (for adaptation), (ii) amount of adaptation data, (iii) ages of
ISSN:0885-2308
1095-8363
DOI:10.1016/j.csl.2020.101077