Detection of Data Leaks through Large Scale Distributed Query Processing using Machine Learning

With the growth in the distributed data processing and data being the fuel for each of the processes, the query processes of the data are expected to be significantly lower. Hence, the distribution of the data is highly expected and during the distributing of the data, the chances for data leakage i...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of advanced computer science & applications 2021, Vol.12 (12)
Hauptverfasser:	MVSV, Kiranmai, Haritha, D
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Clusters Computer science Data integrity Data processing Leak detection Leakage Leaks Machine learning Methods Queries Query processing Visibility
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	With the growth in the distributed data processing and data being the fuel for each of the processes, the query processes of the data are expected to be significantly lower. Hence, the distribution of the data is highly expected and during the distributing of the data, the chances for data leakage increases to a significant extend. The data leakage problems are not generally caused by intentional errors, rather this is caused by the higher visibility of the data over multiple clusters. Henceforth, the detection process is also very critical. Many of the parallel research attempts have demonstrated various methods for the detection and as well as the prevention methods. The works in the direction of the detection of the data leaks are highly dependent either on the historical information of the leaks or depends on the contextual importance of the data. In both the cases, the outcomes of the detection process accuracy cannot be ensured. In the other hand, the preventive measures can also turn into a reactive process for detection by reversing the principles proposed in these research outcomes, but the computational complexities are significantly higher. Thus, this work proposes a novel strategy for detection of the data leakages after the data distribution during the query processing events. This work proposes an initial Occurrence Based Rule Set Extraction method using Adaptive Threshold for generating the rulesets, further for reducing the time complexity and reducing the loss of dataset attribute information, this work introduces yet another algorithm for Dynamic Inference-based Rule Set Reduction. After the inferences are generated, finally this work deploys the Attribute Subset Equivalence-based Leak Detection mechanism for final detection of the clusters with data leaks. This work demonstrates nearly 89% accuracy for the detection process.
ISSN:	2158-107X 2156-5570
DOI:	10.14569/IJACSA.2021.0121237