Efficient Divide-and-Conquer Classification Based on Parallel Feature-Space Decomposition for Distributed Systems

This paper presents a divide-and-conquer (DC) approach based on feature-space decomposition for classification. When large-scale data sets are present, typical approaches usually employed truncated kernel methods on the feature space or DC approaches on the sample space. However, this did not guaran...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE systems journal 2018-06, Vol.12 (2), p.1492-1498
Hauptverfasser:	Guo, Qi, Chen, Bo-Wei, Rho, Seungmin, Ji, Wen, Jiang, Feng, Ji, Xiangyang, Kung, Sun-Yuan
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Classification Classifiers Computer networks Datasets Decomposition divide and conquer (DC) feature-space decomposition feature-space division fusion Indexes Kernel Matrix decomposition Performance evaluation Solvers Subspaces Support vector machines Time complexity Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper presents a divide-and-conquer (DC) approach based on feature-space decomposition for classification. When large-scale data sets are present, typical approaches usually employed truncated kernel methods on the feature space or DC approaches on the sample space. However, this did not guarantee separability between classes, owing to overfitting. To overcome such problems, this paper proposes a novel DC approach on feature spaces consisting of three steps. First, we divide the feature space into several subspaces using the decomposition method proposed in this paper. Subsequently, these feature subspaces are sent into individual local classifiers for training. Finally, the outcome of local classifiers are fused together to generate the final classification results. We also propose a Cascade-TRBFKRR classifier to reweight training samples for data refinement. Experiments on large-scale data sets are carried out for performance evaluation. The results show that the error rates of the proposed DC method decreased compared with the state-of-the-art fast support vector machine solvers, e.g., reducing error rates by 10.53% and 7.53% on RCV1 and covtype data sets, respectively.
ISSN:	1932-8184 1937-9234
DOI:	10.1109/JSYST.2015.2478800