DISCOLEAF: Personalized DIScretization of COntinuous Attributes for LEArning with Federated Decision Trees

Federated learning is a distributed machine learning framework, in which each client participating to the federation trains a machine learning model on its data, and shares the trained model information with a central server, which aggregates, and sends the aggregated information back to the distrib...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Kwatra, Saloni, Torra, Vicenç
Format:	Buchkapitel
Sprache:	eng
Schlagworte:	Computer Science datalogi Decision Tree Aggregation Decision Trees Differential Privacy Discretization Federated Learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Federated learning is a distributed machine learning framework, in which each client participating to the federation trains a machine learning model on its data, and shares the trained model information with a central server, which aggregates, and sends the aggregated information back to the distributed clients. The machine learning model we choose to work with is decision trees, due to their simplicity, and interpretability. On that note, we propose a full-fledged federated pipeline, which includes discretization and learning with decision trees for the horizontally partitioned data. Our federated discretization approach can be plugged-in as a prepossessing step before any other federated learning algorithm. During discretization, we ensure that each client creates the number of discrete bins, according to their own data/choice. Hence, our approach is both federated and personalized. After discretization, we propose to apply the post randomization method to protect the discretized data with differential privacy guarantees. After protecting its database, each client trains a decision tree classifier on its protected database locally, and shares the nodes, containing the split attribute, and the split value with the central server. The central server obtains the most occurred split attribute, and combines the split values. This process goes on until all the nodes to be merged are leaf nodes. The central server then shares the merged tree with the distributed clients. Hence, our proposed framework performs personalized, privacy-preserving federated learning with decision trees by discretizing continuous attributes, and then masking them prior to the training stage. We call our proposed framework discoleaf.
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-031-69651-0_23