Fast semi-supervised discriminant analysis for binary classification of large data-sets

High-dimensional data requires scalable algorithms. We propose and analyze three scalable and related algorithms for semi-supervised discriminant analysis (SDA). These methods are based on Krylov subspace methods which exploit the data sparsity and the shift-invariance of Krylov subspaces. In additi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2018-03
Hauptverfasser:	Tavernier, Joris, Simm, Jaak, Meerbergen, Karl, Wegner, Joerg Kurt, Ceulemans, Hugo, Moreau, Yves
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Computer Science - Artificial Intelligence Computer Science - Numerical Analysis Computer Science - Performance Discriminant analysis Performance prediction Proteins State of the art Subspace methods Subspaces
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	High-dimensional data requires scalable algorithms. We propose and analyze three scalable and related algorithms for semi-supervised discriminant analysis (SDA). These methods are based on Krylov subspace methods which exploit the data sparsity and the shift-invariance of Krylov subspaces. In addition, the problem definition was improved by adding centralization to the semi-supervised setting. The proposed methods are evaluated on a industry-scale data set from a pharmaceutical company to predict compound activity on target proteins. The results show that SDA achieves good predictive performance and our methods only require a few seconds, significantly improving computation time on previous state of the art.
ISSN:	2331-8422
DOI:	10.48550/arxiv.1709.04794