H-Voice: Fake voice histograms (Imitation+DeepVoice)

This data set consists of (6672) histograms of original voice recordings and fake voice recordings obtained by Imitation [1, 2] and Deep Voice [3]. The histograms provided in this dataset can be used to train a machine learning system to classify original and fake voice recordings obtained with the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Ballesteros L, Dora Maria
Format:	Dataset
Sprache:	eng
Schlagworte:	Computer Vision Machine Learning Speech Processing
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This data set consists of (6672) histograms of original voice recordings and fake voice recordings obtained by Imitation [1, 2] and Deep Voice [3]. The histograms provided in this dataset can be used to train a machine learning system to classify original and fake voice recordings obtained with the imitation and Deep Voice algorithms. Each directory has the following composition: -- corrupted images have been fixed -- Training_fake: 2088 histograms of fake voice recordings (2016 with Imitation and with 72 Deep Voice) Training_original: 2020 histograms of original voice recordings Validation_fake: 864 histograms of fake voice recordings (all with Imitation) Validation_original: 864 histograms of original voice recordings External_test1: 760 histograms (380 original + 380 fake with Imitation) External_test2: 76 histograms (4 original + 72 fake with Deep Voice) References: [1] DM Ballesteros L, JM Moreno A. Highly transparent steganography model of speech signals using Efficient Wavelet Masking. Expert Systems with Applications 39 (10), 2012, 9141-9149, https://doi.org/10.1016/j.eswa.2012.02.066 [2] DM Ballesteros L, JM Moreno A. On the ability of adaptation of speech signals and data hiding, Expert Systems with Applications 39 (16), 2012, 12574-12579, https://doi.org/10.1016/j.eswa.2012.05.027 [3] S.O. Arik, M. Chrzanowski, A. Coates, G. Diamos, A. Gibiansky, Y. Kang, X. Li, J. Miller, A. Ng, J. Raiman, S. Sengupta, M. Shoeybi. Deep Voice: Real-time Neural Text-to-Speech. 2017. https://arxiv.org/abs/1702.07825
DOI:	10.17632/k47yd3m28w