Detection of Data Leaks through Large Scale Distributed Query Processing using Machine Learning

With the growth in the distributed data processing and data being the fuel for each of the processes, the query processes of the data are expected to be significantly lower. Hence, the distribution of the data is highly expected and during the distributing of the data, the chances for data leakage i...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of advanced computer science & applications 2021, Vol.12 (12)
Hauptverfasser:	MVSV, Kiranmai, Haritha, D
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Clusters Computer science Data integrity Data processing Leak detection Leakage Leaks Machine learning Methods Queries Query processing Visibility
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	12
container_start_page
container_title	International journal of advanced computer science & applications
container_volume	12
creator	MVSV, Kiranmai Haritha, D
description	With the growth in the distributed data processing and data being the fuel for each of the processes, the query processes of the data are expected to be significantly lower. Hence, the distribution of the data is highly expected and during the distributing of the data, the chances for data leakage increases to a significant extend. The data leakage problems are not generally caused by intentional errors, rather this is caused by the higher visibility of the data over multiple clusters. Henceforth, the detection process is also very critical. Many of the parallel research attempts have demonstrated various methods for the detection and as well as the prevention methods. The works in the direction of the detection of the data leaks are highly dependent either on the historical information of the leaks or depends on the contextual importance of the data. In both the cases, the outcomes of the detection process accuracy cannot be ensured. In the other hand, the preventive measures can also turn into a reactive process for detection by reversing the principles proposed in these research outcomes, but the computational complexities are significantly higher. Thus, this work proposes a novel strategy for detection of the data leakages after the data distribution during the query processing events. This work proposes an initial Occurrence Based Rule Set Extraction method using Adaptive Threshold for generating the rulesets, further for reducing the time complexity and reducing the loss of dataset attribute information, this work introduces yet another algorithm for Dynamic Inference-based Rule Set Reduction. After the inferences are generated, finally this work deploys the Attribute Subset Equivalence-based Leak Detection mechanism for final detection of the clusters with data leaks. This work demonstrates nearly 89% accuracy for the detection process.
doi_str_mv	10.14569/IJACSA.2021.0121237
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2655113401</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2655113401</sourcerecordid><originalsourceid>FETCH-LOGICAL-c274t-103ad5ed96c5b62971b7129c361a06ad313a9d03debe6bda9c26cee344a45c863</originalsourceid><addsrcrecordid>eNotkMtOwzAQRS0EElXpH7CwxDrFY8dOvawaHkVBgAoSO8txpm1KSYrtLPr3pI9ZzIxGV_eODiG3wMaQSqXv5y_T2WI65ozDmAEHLrILMuAgVSJlxi6P-yQBln1fk1EIG9aX0FxNxICYHCO6WLcNbZc0t9HSAu1PoHHt2261poX1K6QLZ7dI8zpEX5ddxIp-dOj39N23DkOomxXtjv3VunXd4MHEN_3hhlwt7Tbg6DyH5Ovx4XP2nBRvT_PZtEgcz9LYPydsJbHSyslScZ1BmQHXTiiwTNlKgLC6YqLCElVZWe24cogiTW0q3USJIbk7-e58-9dhiGbTdr7pIw1XUgKIlEGvSk8q59sQPC7Nzte_1u8NMHOkaU40zYGmOdMU_-99aCo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2655113401</pqid></control><display><type>article</type><title>Detection of Data Leaks through Large Scale Distributed Query Processing using Machine Learning</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>MVSV, Kiranmai ; Haritha, D</creator><creatorcontrib>MVSV, Kiranmai ; Haritha, D</creatorcontrib><description>With the growth in the distributed data processing and data being the fuel for each of the processes, the query processes of the data are expected to be significantly lower. Hence, the distribution of the data is highly expected and during the distributing of the data, the chances for data leakage increases to a significant extend. The data leakage problems are not generally caused by intentional errors, rather this is caused by the higher visibility of the data over multiple clusters. Henceforth, the detection process is also very critical. Many of the parallel research attempts have demonstrated various methods for the detection and as well as the prevention methods. The works in the direction of the detection of the data leaks are highly dependent either on the historical information of the leaks or depends on the contextual importance of the data. In both the cases, the outcomes of the detection process accuracy cannot be ensured. In the other hand, the preventive measures can also turn into a reactive process for detection by reversing the principles proposed in these research outcomes, but the computational complexities are significantly higher. Thus, this work proposes a novel strategy for detection of the data leakages after the data distribution during the query processing events. This work proposes an initial Occurrence Based Rule Set Extraction method using Adaptive Threshold for generating the rulesets, further for reducing the time complexity and reducing the loss of dataset attribute information, this work introduces yet another algorithm for Dynamic Inference-based Rule Set Reduction. After the inferences are generated, finally this work deploys the Attribute Subset Equivalence-based Leak Detection mechanism for final detection of the clusters with data leaks. This work demonstrates nearly 89% accuracy for the detection process.</description><identifier>ISSN: 2158-107X</identifier><identifier>EISSN: 2156-5570</identifier><identifier>DOI: 10.14569/IJACSA.2021.0121237</identifier><language>eng</language><publisher>West Yorkshire: Science and Information (SAI) Organization Limited</publisher><subject>Algorithms ; Clusters ; Computer science ; Data integrity ; Data processing ; Leak detection ; Leakage ; Leaks ; Machine learning ; Methods ; Queries ; Query processing ; Visibility</subject><ispartof>International journal of advanced computer science & applications, 2021, Vol.12 (12)</ispartof><rights>2021. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,4014,27914,27915,27916</link.rule.ids></links><search><creatorcontrib>MVSV, Kiranmai</creatorcontrib><creatorcontrib>Haritha, D</creatorcontrib><title>Detection of Data Leaks through Large Scale Distributed Query Processing using Machine Learning</title><title>International journal of advanced computer science & applications</title><description>With the growth in the distributed data processing and data being the fuel for each of the processes, the query processes of the data are expected to be significantly lower. Hence, the distribution of the data is highly expected and during the distributing of the data, the chances for data leakage increases to a significant extend. The data leakage problems are not generally caused by intentional errors, rather this is caused by the higher visibility of the data over multiple clusters. Henceforth, the detection process is also very critical. Many of the parallel research attempts have demonstrated various methods for the detection and as well as the prevention methods. The works in the direction of the detection of the data leaks are highly dependent either on the historical information of the leaks or depends on the contextual importance of the data. In both the cases, the outcomes of the detection process accuracy cannot be ensured. In the other hand, the preventive measures can also turn into a reactive process for detection by reversing the principles proposed in these research outcomes, but the computational complexities are significantly higher. Thus, this work proposes a novel strategy for detection of the data leakages after the data distribution during the query processing events. This work proposes an initial Occurrence Based Rule Set Extraction method using Adaptive Threshold for generating the rulesets, further for reducing the time complexity and reducing the loss of dataset attribute information, this work introduces yet another algorithm for Dynamic Inference-based Rule Set Reduction. After the inferences are generated, finally this work deploys the Attribute Subset Equivalence-based Leak Detection mechanism for final detection of the clusters with data leaks. This work demonstrates nearly 89% accuracy for the detection process.</description><subject>Algorithms</subject><subject>Clusters</subject><subject>Computer science</subject><subject>Data integrity</subject><subject>Data processing</subject><subject>Leak detection</subject><subject>Leakage</subject><subject>Leaks</subject><subject>Machine learning</subject><subject>Methods</subject><subject>Queries</subject><subject>Query processing</subject><subject>Visibility</subject><issn>2158-107X</issn><issn>2156-5570</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNotkMtOwzAQRS0EElXpH7CwxDrFY8dOvawaHkVBgAoSO8txpm1KSYrtLPr3pI9ZzIxGV_eODiG3wMaQSqXv5y_T2WI65ozDmAEHLrILMuAgVSJlxi6P-yQBln1fk1EIG9aX0FxNxICYHCO6WLcNbZc0t9HSAu1PoHHt2261poX1K6QLZ7dI8zpEX5ddxIp-dOj39N23DkOomxXtjv3VunXd4MHEN_3hhlwt7Tbg6DyH5Ovx4XP2nBRvT_PZtEgcz9LYPydsJbHSyslScZ1BmQHXTiiwTNlKgLC6YqLCElVZWe24cogiTW0q3USJIbk7-e58-9dhiGbTdr7pIw1XUgKIlEGvSk8q59sQPC7Nzte_1u8NMHOkaU40zYGmOdMU_-99aCo</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>MVSV, Kiranmai</creator><creator>Haritha, D</creator><general>Science and Information (SAI) Organization Limited</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7XB</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>2021</creationdate><title>Detection of Data Leaks through Large Scale Distributed Query Processing using Machine Learning</title><author>MVSV, Kiranmai ; Haritha, D</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c274t-103ad5ed96c5b62971b7129c361a06ad313a9d03debe6bda9c26cee344a45c863</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Clusters</topic><topic>Computer science</topic><topic>Data integrity</topic><topic>Data processing</topic><topic>Leak detection</topic><topic>Leakage</topic><topic>Leaks</topic><topic>Machine learning</topic><topic>Methods</topic><topic>Queries</topic><topic>Query processing</topic><topic>Visibility</topic><toplevel>online_resources</toplevel><creatorcontrib>MVSV, Kiranmai</creatorcontrib><creatorcontrib>Haritha, D</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>International journal of advanced computer science & applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>MVSV, Kiranmai</au><au>Haritha, D</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Detection of Data Leaks through Large Scale Distributed Query Processing using Machine Learning</atitle><jtitle>International journal of advanced computer science & applications</jtitle><date>2021</date><risdate>2021</risdate><volume>12</volume><issue>12</issue><issn>2158-107X</issn><eissn>2156-5570</eissn><abstract>With the growth in the distributed data processing and data being the fuel for each of the processes, the query processes of the data are expected to be significantly lower. Hence, the distribution of the data is highly expected and during the distributing of the data, the chances for data leakage increases to a significant extend. The data leakage problems are not generally caused by intentional errors, rather this is caused by the higher visibility of the data over multiple clusters. Henceforth, the detection process is also very critical. Many of the parallel research attempts have demonstrated various methods for the detection and as well as the prevention methods. The works in the direction of the detection of the data leaks are highly dependent either on the historical information of the leaks or depends on the contextual importance of the data. In both the cases, the outcomes of the detection process accuracy cannot be ensured. In the other hand, the preventive measures can also turn into a reactive process for detection by reversing the principles proposed in these research outcomes, but the computational complexities are significantly higher. Thus, this work proposes a novel strategy for detection of the data leakages after the data distribution during the query processing events. This work proposes an initial Occurrence Based Rule Set Extraction method using Adaptive Threshold for generating the rulesets, further for reducing the time complexity and reducing the loss of dataset attribute information, this work introduces yet another algorithm for Dynamic Inference-based Rule Set Reduction. After the inferences are generated, finally this work deploys the Attribute Subset Equivalence-based Leak Detection mechanism for final detection of the clusters with data leaks. This work demonstrates nearly 89% accuracy for the detection process.</abstract><cop>West Yorkshire</cop><pub>Science and Information (SAI) Organization Limited</pub><doi>10.14569/IJACSA.2021.0121237</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2158-107X
ispartof	International journal of advanced computer science & applications, 2021, Vol.12 (12)
issn	2158-107X 2156-5570
language	eng
recordid	cdi_proquest_journals_2655113401
source	Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	Algorithms Clusters Computer science Data integrity Data processing Leak detection Leakage Leaks Machine learning Methods Queries Query processing Visibility
title	Detection of Data Leaks through Large Scale Distributed Query Processing using Machine Learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T05%3A05%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Detection%20of%20Data%20Leaks%20through%20Large%20Scale%20Distributed%20Query%20Processing%20using%20Machine%20Learning&rft.jtitle=International%20journal%20of%20advanced%20computer%20science%20&%20applications&rft.au=MVSV,%20Kiranmai&rft.date=2021&rft.volume=12&rft.issue=12&rft.issn=2158-107X&rft.eissn=2156-5570&rft_id=info:doi/10.14569/IJACSA.2021.0121237&rft_dat=%3Cproquest_cross%3E2655113401%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2655113401&rft_id=info:pmid/&rfr_iscdi=true