Ability of Human Auditory Perception to Distinguish Human-imitated Speech

Distinguishing human-imitated speech from genuine speech presents a significant challenge for listeners due to their natural resemblance. Human auditory perception (HAP) has been widely studied to understand its mechanisms, with HAP-based acoustic features and metrics applied in various applications...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2025-01, Vol.13, p.1-1
Hauptverfasser:	Zaman, Khalid, Li, Kai, Samiul, Islam J A M, Uezu, Yasufumi, Kidani, Shunsuke, Unoki, Masashi
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Focusing Human Auditory Perception Human Listeners Human-imitated Speech Measurement Noise Noise measurement Performance evaluation Sensitivity Speech enhancement Speech recognition Target recognition Timbral Features
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Distinguishing human-imitated speech from genuine speech presents a significant challenge for listeners due to their natural resemblance. Human auditory perception (HAP) has been widely studied to understand its mechanisms, with HAP-based acoustic features and metrics applied in various applications to assess sound quality and discriminate sound events. Leveraging these insights, this study specifically aims to evaluate HAP's effectiveness in differentiating genuine from imitated speech through a systematic subject test. To this end, the study applies HAP to the task of distinguishing genuine from imitated speech, using a specially developed dataset of human-imitated speech, due to the lack of comparable publicly available datasets. A three-phase, human-centered approach was used to evaluate HAP ability, and participants achieved an average accuracy of 70.10% in distinguishing genuine from imitated speech in the final test. Additionally, a feasibility study was conducted using two feature sets for machine classification; among the timbral features, boominess and depth performed best with accuracies of 62% and 60%, respectively, while general features like Mel-spectrograms reached 51%. These results underscore the importance of auditory-related features in effectively detecting imitated speech.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2025.3526631