An Investigation of Smart Contract for Collaborative Machine Learning Model Training
Machine learning (ML) has penetrated various fields in the era of big data. The advantage of collaborative machine learning (CML) over most conventional ML lies in the joint effort of decentralized nodes or agents that results in better model performance and generalization. As the training of ML mod...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Machine learning (ML) has penetrated various fields in the era of big data.
The advantage of collaborative machine learning (CML) over most conventional ML
lies in the joint effort of decentralized nodes or agents that results in
better model performance and generalization. As the training of ML models
requires a massive amount of good quality data, it is necessary to eliminate
concerns about data privacy and ensure high-quality data. To solve this
problem, we cast our eyes on the integration of CML and smart contracts. Based
on blockchain, smart contracts enable automatic execution of data preserving
and validation, as well as the continuity of CML model training. In our
simulation experiments, we define incentive mechanisms on the smart contract,
investigate the important factors such as the number of features in the dataset
(num_words), the size of the training data, the cost for the data holders to
submit data, etc., and conclude how these factors impact the performance
metrics of the model: the accuracy of the trained model, the gap between the
accuracies of the model before and after simulation, and the time to use up the
balance of bad agent. For instance, the increase of the value of num_words
leads to higher model accuracy and eliminates the negative influence of
malicious agents in a shorter time from our observation of the experiment
results. Statistical analyses show that with the help of smart contracts, the
influence of invalid data is efficiently diminished and model robustness is
maintained. We also discuss the gap in existing research and put forward
possible future directions for further works. |
---|---|
DOI: | 10.48550/arxiv.2209.05017 |