His-GAN: A histogram-based GAN model to improve data generation quality

Generative Adversarial Network (GAN) has become an active research field due to its capability to generate quality simulation data. However, two consistent distributions (generated data distribution and original data distribution) produced by GAN cannot guarantee that generated data are always close...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neural networks 2019-11, Vol.119, p.31-45
Hauptverfasser: Li, Wei, Ding, Wei, Sadasivam, Rajani, Cui, Xiaohui, Chen, Ping
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Generative Adversarial Network (GAN) has become an active research field due to its capability to generate quality simulation data. However, two consistent distributions (generated data distribution and original data distribution) produced by GAN cannot guarantee that generated data are always close to real data. Traditionally GAN is mainly applied to images, and it becomes more challenging for numeric datasets. In this paper, we propose a histogram-based GAN model (His-GAN). The purpose of our proposed model is to help GAN produce generated data with high quality. Specifically, we map generated data and original data into a histogram, then we count probability percentile on each bin and calculate dissimilarity with traditional f-divergence measures (e.g., Hellinger distance, Jensen–Shannon divergence) and Histogram Intersection Kernel. After that, we incorporate this dissimilarity score into training of the GAN model to update the generator’s parameters to improve generated data quality. This is because the parameters have an influence on the generated data quality. Moreover, we revised GAN training process by feeding GAN model with one group of samples (these samples can come from one class or one cluster that hold similar characteristics) each time, so the final generated data could contain the characteristics from a single group to overcome the challenge of figuring out complex characteristics from mixed groups/clusters of data. In this way, we can generate data that is more indistinguishable from original data. We conduct extensive experiments to validate our idea with MNIST, CIFAR-10, and a real-world numeric dataset, and the results clearly show the effectiveness of our approach.
ISSN:0893-6080
1879-2782
DOI:10.1016/j.neunet.2019.07.001