A Spanish multispeaker database of esophageal speech
•A novel database of Spanish esophageal voices designed for developing new software for speakers with this pathology.•Includes recordings of 100 sentences from 30 esophageal speakers. The size and variety of the database makes it unique.•Includes the phonetic labels of the recordings. Automatic phon...
Gespeichert in:
Veröffentlicht in: | Computer speech & language 2021-03, Vol.66, p.101168, Article 101168 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •A novel database of Spanish esophageal voices designed for developing new software for speakers with this pathology.•Includes recordings of 100 sentences from 30 esophageal speakers. The size and variety of the database makes it unique.•Includes the phonetic labels of the recordings. Automatic phonetic labelling procedure is described.•The main acoustic characteristics of the voices are described and compared with a reduced set of healthy voices.•The database has successfully been used for reducing Automatic Speech Recognition errors for esophageal speakers.
A laryngectomee is a person whose larynx has been removed by surgery, usually due to laryngeal cancer. After surgery, most laryngectomees are able to speak again, using techniques that are learned with the help of a speech therapist. This is termed as alaryngeal speech, and esophageal speech (ES) is one of the several alaryngeal speech production modes. A considerable amount of research has been dedicated to the study of alaryngeal speech, with a wide range of aims such as helping speech therapists with evaluation and diagnosis, and improving its quality and intelligibility using digital signal processing techniques. We present to you a database of Spanish ES voices, named AhoSLABI, which is designed to allow the development of new support technologies for this speech impairment. The database primarily consists of recordings of 31 laryngectomees (27 males and 4 females) pronouncing phonetically balanced sentences. Additionally, it includes parallel recordings of the sentences by 9 healthy speakers (6 males and 3 females) to facilitate speech processing tasks that require small parallel corpora, such as voice conversion or synthetic speech adaptation. Apart from the sentences, the database includes sustained vowels and a small set of isolated words, which can be valuable for research on ES analysis, diagnosis and evaluation. The paper describes the main contents of the database, the recording protocols and procedure, as well as the labeling process. The main acoustic characteristics of the voices, such as speaking rate, durations of the recordings, phones and silences, and other such characteristics are compared with those of a reduced set of healthy voices. In addition, we describe an experiment using the database to improve the performance of an ASR system for ES speakers. This new resource will be made available to the scientific community with the hope that it will be used to improve the quality of |
---|---|
ISSN: | 0885-2308 1095-8363 |
DOI: | 10.1016/j.csl.2020.101168 |