Attribute-Distributed Learning: Models, Limits, and Algorithms

This paper introduces a framework for distributed learning (regression) on attribute-distributed data. First, the convergence properties of attribute-distributed regression with an additive model and a fusion center are discussed, and the convergence rate and uniqueness of the limit are shown for so...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on signal processing 2011-01, Vol.59 (1), p.386-398
Hauptverfasser:	Haipeng Zheng, Kulkarni, S R, Poor, H V
Format:	Artikel
Sprache:	eng
Schlagworte:	Additives Algorithm design and analysis Algorithms Applied sciences Collaboration Computer simulation Convergence Derivatives Distributed databases Distributed information systems distributed processing Exact sciences and technology Information, signal and communications theory Learning Mathematical models Miscellaneous Partitioning algorithms Prediction algorithms Projection Refitting Regression Signal processing statistical learning Studies Telecommunications and information theory Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper introduces a framework for distributed learning (regression) on attribute-distributed data. First, the convergence properties of attribute-distributed regression with an additive model and a fusion center are discussed, and the convergence rate and uniqueness of the limit are shown for some special cases. Then, taking residual refitting (or boosting) as a prototype algorithm, three different schemes, Simple Iterative Projection, a greedy algorithm, and a parallel algorithm (with its derivatives), are proposed and compared. Among these algorithms, the first two are sequential and have low communication overhead, but are susceptible to overtraining. The parallel algorithm has the best performance, but has significant communication requirements. Instead of directly refitting the ensemble residual sequentially, the parallel algorithm redistributes the residual to each agent in proportion to the coefficients of the optimal linear combination of the current individual estimators. Designing residual redistribution schemes also improves the ability to eliminate irrelevant attributes. The performance of the algorithms is compared via extensive simulations. Communication issues are also considered: the amount of data to be exchanged among the three algorithms is compared, and the three methods are generalized to scenarios without a fusion center.
ISSN:	1053-587X 1941-0476
DOI:	10.1109/TSP.2010.2088393