Distributed statistical optimization for non-randomly stored big data with application to penalized learning

Distributed optimization for big data has recently attracted enormous attention. However, the existing algorithms are all based on one critical randomness condition, i.e., the big data are randomly distributed on different machines. This is seldom in practice, and violating this condition can seriou...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Statistics and computing 2023-06, Vol.33 (3), Article 73
Hauptverfasser: Wang, Kangning, Li, Shaomin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Distributed optimization for big data has recently attracted enormous attention. However, the existing algorithms are all based on one critical randomness condition, i.e., the big data are randomly distributed on different machines. This is seldom in practice, and violating this condition can seriously degrade the estimation accuracy. To fix this problem, we propose a pilot dataset surrogate loss function based optimization framework, which can realize communication-efficient distributed optimization for non-randomly distributed big data. Furthermore, we also apply it to penalized high-dimensional sparse learning problems by combining it with the penalty functions. Theoretical properties and numerical results all confirm the good performance of the proposed methods.
ISSN:0960-3174
1573-1375
DOI:10.1007/s11222-023-10247-x