Speaker Identification for Business-Card-Type Sensors

Human collaboration has a great impact on the performance of multi-person activities. The analysis of speaker information and speech timing can be used to extract human collaboration data in detail. Some studies have extracted human collaboration data by identifying a speaker with business-card-type...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE open journal of the Computer Society 2021, Vol.2, p.216-226
Hauptverfasser: Yamaguchi, Shunpei, Oshima, Ritsuko, Oshima, Jun, Shiina, Ryota, Fujihashi, Takuya, Saruwatari, Shunsuke, Watanabe, Takashi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Human collaboration has a great impact on the performance of multi-person activities. The analysis of speaker information and speech timing can be used to extract human collaboration data in detail. Some studies have extracted human collaboration data by identifying a speaker with business-card-type sensors. However, it is difficult to realize speaker identification for business-card-type sensors at low cost and high accuracy because of spikes in the measured sound pressure data, ambient noise in the non-speaker sensor, and synchronization errors across each sensor. This study proposes a novel sound pressure sensor and speaker identification algorithm to realize speaker identification for business-card-type sensors. The sensor extracts the user's speech at low cost and high accuracy by employing a peak hold circuit and time synchronization module for spike mitigation and precise time synchronization. The algorithm identifies a speaker with high accuracy by removing ambient noise. The evaluations show that the algorithm accurately identifies a speaker in a multi-person activity considering varying numbers of users, environmental noises, and reverberation conditions as well as long or short utterances. In addition, the peak hold circuit enables accurate extraction of speech and the synchronization error between the sensors is always within \pm30 \boldsymbol\mus, that is, negligible error.
ISSN:2644-1268
2644-1268
DOI:10.1109/OJCS.2021.3075469