Realtime voice activity and pitch modulation for laryngectomy transducers using head and facial gestures

Individuals who have undergone laryngectomy often rely on handheld transducers (i.e., the electrolarynx) to excite the vocal tract and produce speech. Widely used electrolarynx designs are limited, in that they require manual control of voice activity and pitch modulation. It would be advantageous t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of the Acoustical Society of America 2015-04, Vol.137 (4_Supplement), p.2302-2302
Hauptverfasser: Mohan, Gautam, Hamilton, Katherine, Grasberger, Andrew, Lammert, Adam C., Waterman, Jason
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Individuals who have undergone laryngectomy often rely on handheld transducers (i.e., the electrolarynx) to excite the vocal tract and produce speech. Widely used electrolarynx designs are limited, in that they require manual control of voice activity and pitch modulation. It would be advantageous to have an interface that requires less training, perhaps using the remaining, intact speech production system as a scaffold. Strong evidence exists that aspects of head motion and facial gestures are highly correlated with gestures of voicing and pitch. Therefore, the goal of project MANATEE is to develop an electrolarynx control interface which takes advantage of those correlations. The focus of the current study is to determine the feasibility of using head and facial features to accurately and efficiently modulate the pitch of speaker's electrolarynx in real time on a mobile platform using the built-in video camera. A prototype interface, capable of running on desktop machines and compatible Android devices, is implemented using OpenCV for video feature extraction and statistical prediction of the electrolarynx control signal. Initial performance evaluation is promising, showing pitch prediction accuracies at double the chance-level baseline, and prediction delays well below the perceptually-relevant, ~50 ms threshold.
ISSN:0001-4966
1520-8524
DOI:10.1121/1.4920403