A Comparison of Online Automatic Speech Recognition Systems and the Nonverbal Responses to Unintelligible Speech
Automatic Speech Recognition (ASR) systems have proliferated over the recent years to the point that free platforms such as YouTube now provide speech recognition services. Given the wide selection of ASR systems, we contribute to the field of automatic speech recognition by comparing the relative p...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Automatic Speech Recognition (ASR) systems have proliferated over the recent
years to the point that free platforms such as YouTube now provide speech
recognition services. Given the wide selection of ASR systems, we contribute to
the field of automatic speech recognition by comparing the relative performance
of two sets of manual transcriptions and five sets of automatic transcriptions
(Google Cloud, IBM Watson, Microsoft Azure, Trint, and YouTube) to help
researchers to select accurate transcription services. In addition, we identify
nonverbal behaviors that are associated with unintelligible speech, as
indicated by high word error rates. We show that manual transcriptions remain
superior to current automatic transcriptions. Amongst the automatic
transcription services, YouTube offers the most accurate transcription service.
For non-verbal behavioral involvement, we provide evidence that the variability
of smile intensities from the listener is high (low) when the speaker is clear
(unintelligible). These findings are derived from videoconferencing
interactions between student doctors and simulated patients; therefore, we
contribute towards both the ASR literature and the healthcare communication
skills teaching community. |
---|---|
DOI: | 10.48550/arxiv.1904.12403 |