Detection of Data Leaks through Large Scale Distributed Query Processing using Machine Learning
With the growth in the distributed data processing and data being the fuel for each of the processes, the query processes of the data are expected to be significantly lower. Hence, the distribution of the data is highly expected and during the distributing of the data, the chances for data leakage i...
Gespeichert in:
Veröffentlicht in: | International journal of advanced computer science & applications 2021, Vol.12 (12) |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 12 |
container_start_page | |
container_title | International journal of advanced computer science & applications |
container_volume | 12 |
creator | MVSV, Kiranmai Haritha, D |
description | With the growth in the distributed data processing and data being the fuel for each of the processes, the query processes of the data are expected to be significantly lower. Hence, the distribution of the data is highly expected and during the distributing of the data, the chances for data leakage increases to a significant extend. The data leakage problems are not generally caused by intentional errors, rather this is caused by the higher visibility of the data over multiple clusters. Henceforth, the detection process is also very critical. Many of the parallel research attempts have demonstrated various methods for the detection and as well as the prevention methods. The works in the direction of the detection of the data leaks are highly dependent either on the historical information of the leaks or depends on the contextual importance of the data. In both the cases, the outcomes of the detection process accuracy cannot be ensured. In the other hand, the preventive measures can also turn into a reactive process for detection by reversing the principles proposed in these research outcomes, but the computational complexities are significantly higher. Thus, this work proposes a novel strategy for detection of the data leakages after the data distribution during the query processing events. This work proposes an initial Occurrence Based Rule Set Extraction method using Adaptive Threshold for generating the rulesets, further for reducing the time complexity and reducing the loss of dataset attribute information, this work introduces yet another algorithm for Dynamic Inference-based Rule Set Reduction. After the inferences are generated, finally this work deploys the Attribute Subset Equivalence-based Leak Detection mechanism for final detection of the clusters with data leaks. This work demonstrates nearly 89% accuracy for the detection process. |
doi_str_mv | 10.14569/IJACSA.2021.0121237 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2655113401</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2655113401</sourcerecordid><originalsourceid>FETCH-LOGICAL-c274t-103ad5ed96c5b62971b7129c361a06ad313a9d03debe6bda9c26cee344a45c863</originalsourceid><addsrcrecordid>eNotkMtOwzAQRS0EElXpH7CwxDrFY8dOvawaHkVBgAoSO8txpm1KSYrtLPr3pI9ZzIxGV_eODiG3wMaQSqXv5y_T2WI65ozDmAEHLrILMuAgVSJlxi6P-yQBln1fk1EIG9aX0FxNxICYHCO6WLcNbZc0t9HSAu1PoHHt2261poX1K6QLZ7dI8zpEX5ddxIp-dOj39N23DkOomxXtjv3VunXd4MHEN_3hhlwt7Tbg6DyH5Ovx4XP2nBRvT_PZtEgcz9LYPydsJbHSyslScZ1BmQHXTiiwTNlKgLC6YqLCElVZWe24cogiTW0q3USJIbk7-e58-9dhiGbTdr7pIw1XUgKIlEGvSk8q59sQPC7Nzte_1u8NMHOkaU40zYGmOdMU_-99aCo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2655113401</pqid></control><display><type>article</type><title>Detection of Data Leaks through Large Scale Distributed Query Processing using Machine Learning</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>MVSV, Kiranmai ; Haritha, D</creator><creatorcontrib>MVSV, Kiranmai ; Haritha, D</creatorcontrib><description>With the growth in the distributed data processing and data being the fuel for each of the processes, the query processes of the data are expected to be significantly lower. Hence, the distribution of the data is highly expected and during the distributing of the data, the chances for data leakage increases to a significant extend. The data leakage problems are not generally caused by intentional errors, rather this is caused by the higher visibility of the data over multiple clusters. Henceforth, the detection process is also very critical. Many of the parallel research attempts have demonstrated various methods for the detection and as well as the prevention methods. The works in the direction of the detection of the data leaks are highly dependent either on the historical information of the leaks or depends on the contextual importance of the data. In both the cases, the outcomes of the detection process accuracy cannot be ensured. In the other hand, the preventive measures can also turn into a reactive process for detection by reversing the principles proposed in these research outcomes, but the computational complexities are significantly higher. Thus, this work proposes a novel strategy for detection of the data leakages after the data distribution during the query processing events. This work proposes an initial Occurrence Based Rule Set Extraction method using Adaptive Threshold for generating the rulesets, further for reducing the time complexity and reducing the loss of dataset attribute information, this work introduces yet another algorithm for Dynamic Inference-based Rule Set Reduction. After the inferences are generated, finally this work deploys the Attribute Subset Equivalence-based Leak Detection mechanism for final detection of the clusters with data leaks. This work demonstrates nearly 89% accuracy for the detection process.</description><identifier>ISSN: 2158-107X</identifier><identifier>EISSN: 2156-5570</identifier><identifier>DOI: 10.14569/IJACSA.2021.0121237</identifier><language>eng</language><publisher>West Yorkshire: Science and Information (SAI) Organization Limited</publisher><subject>Algorithms ; Clusters ; Computer science ; Data integrity ; Data processing ; Leak detection ; Leakage ; Leaks ; Machine learning ; Methods ; Queries ; Query processing ; Visibility</subject><ispartof>International journal of advanced computer science & applications, 2021, Vol.12 (12)</ispartof><rights>2021. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,4014,27914,27915,27916</link.rule.ids></links><search><creatorcontrib>MVSV, Kiranmai</creatorcontrib><creatorcontrib>Haritha, D</creatorcontrib><title>Detection of Data Leaks through Large Scale Distributed Query Processing using Machine Learning</title><title>International journal of advanced computer science & applications</title><description>With the growth in the distributed data processing and data being the fuel for each of the processes, the query processes of the data are expected to be significantly lower. Hence, the distribution of the data is highly expected and during the distributing of the data, the chances for data leakage increases to a significant extend. The data leakage problems are not generally caused by intentional errors, rather this is caused by the higher visibility of the data over multiple clusters. Henceforth, the detection process is also very critical. Many of the parallel research attempts have demonstrated various methods for the detection and as well as the prevention methods. The works in the direction of the detection of the data leaks are highly dependent either on the historical information of the leaks or depends on the contextual importance of the data. In both the cases, the outcomes of the detection process accuracy cannot be ensured. In the other hand, the preventive measures can also turn into a reactive process for detection by reversing the principles proposed in these research outcomes, but the computational complexities are significantly higher. Thus, this work proposes a novel strategy for detection of the data leakages after the data distribution during the query processing events. This work proposes an initial Occurrence Based Rule Set Extraction method using Adaptive Threshold for generating the rulesets, further for reducing the time complexity and reducing the loss of dataset attribute information, this work introduces yet another algorithm for Dynamic Inference-based Rule Set Reduction. After the inferences are generated, finally this work deploys the Attribute Subset Equivalence-based Leak Detection mechanism for final detection of the clusters with data leaks. This work demonstrates nearly 89% accuracy for the detection process.</description><subject>Algorithms</subject><subject>Clusters</subject><subject>Computer science</subject><subject>Data integrity</subject><subject>Data processing</subject><subject>Leak detection</subject><subject>Leakage</subject><subject>Leaks</subject><subject>Machine learning</subject><subject>Methods</subject><subject>Queries</subject><subject>Query processing</subject><subject>Visibility</subject><issn>2158-107X</issn><issn>2156-5570</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNotkMtOwzAQRS0EElXpH7CwxDrFY8dOvawaHkVBgAoSO8txpm1KSYrtLPr3pI9ZzIxGV_eODiG3wMaQSqXv5y_T2WI65ozDmAEHLrILMuAgVSJlxi6P-yQBln1fk1EIG9aX0FxNxICYHCO6WLcNbZc0t9HSAu1PoHHt2261poX1K6QLZ7dI8zpEX5ddxIp-dOj39N23DkOomxXtjv3VunXd4MHEN_3hhlwt7Tbg6DyH5Ovx4XP2nBRvT_PZtEgcz9LYPydsJbHSyslScZ1BmQHXTiiwTNlKgLC6YqLCElVZWe24cogiTW0q3USJIbk7-e58-9dhiGbTdr7pIw1XUgKIlEGvSk8q59sQPC7Nzte_1u8NMHOkaU40zYGmOdMU_-99aCo</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>MVSV, Kiranmai</creator><creator>Haritha, D</creator><general>Science and Information (SAI) Organization Limited</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7XB</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>2021</creationdate><title>Detection of Data Leaks through Large Scale Distributed Query Processing using Machine Learning</title><author>MVSV, Kiranmai ; Haritha, D</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c274t-103ad5ed96c5b62971b7129c361a06ad313a9d03debe6bda9c26cee344a45c863</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Clusters</topic><topic>Computer science</topic><topic>Data integrity</topic><topic>Data processing</topic><topic>Leak detection</topic><topic>Leakage</topic><topic>Leaks</topic><topic>Machine learning</topic><topic>Methods</topic><topic>Queries</topic><topic>Query processing</topic><topic>Visibility</topic><toplevel>online_resources</toplevel><creatorcontrib>MVSV, Kiranmai</creatorcontrib><creatorcontrib>Haritha, D</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>International journal of advanced computer science & applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>MVSV, Kiranmai</au><au>Haritha, D</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Detection of Data Leaks through Large Scale Distributed Query Processing using Machine Learning</atitle><jtitle>International journal of advanced computer science & applications</jtitle><date>2021</date><risdate>2021</risdate><volume>12</volume><issue>12</issue><issn>2158-107X</issn><eissn>2156-5570</eissn><abstract>With the growth in the distributed data processing and data being the fuel for each of the processes, the query processes of the data are expected to be significantly lower. Hence, the distribution of the data is highly expected and during the distributing of the data, the chances for data leakage increases to a significant extend. The data leakage problems are not generally caused by intentional errors, rather this is caused by the higher visibility of the data over multiple clusters. Henceforth, the detection process is also very critical. Many of the parallel research attempts have demonstrated various methods for the detection and as well as the prevention methods. The works in the direction of the detection of the data leaks are highly dependent either on the historical information of the leaks or depends on the contextual importance of the data. In both the cases, the outcomes of the detection process accuracy cannot be ensured. In the other hand, the preventive measures can also turn into a reactive process for detection by reversing the principles proposed in these research outcomes, but the computational complexities are significantly higher. Thus, this work proposes a novel strategy for detection of the data leakages after the data distribution during the query processing events. This work proposes an initial Occurrence Based Rule Set Extraction method using Adaptive Threshold for generating the rulesets, further for reducing the time complexity and reducing the loss of dataset attribute information, this work introduces yet another algorithm for Dynamic Inference-based Rule Set Reduction. After the inferences are generated, finally this work deploys the Attribute Subset Equivalence-based Leak Detection mechanism for final detection of the clusters with data leaks. This work demonstrates nearly 89% accuracy for the detection process.</abstract><cop>West Yorkshire</cop><pub>Science and Information (SAI) Organization Limited</pub><doi>10.14569/IJACSA.2021.0121237</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2158-107X |
ispartof | International journal of advanced computer science & applications, 2021, Vol.12 (12) |
issn | 2158-107X 2156-5570 |
language | eng |
recordid | cdi_proquest_journals_2655113401 |
source | Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals |
subjects | Algorithms Clusters Computer science Data integrity Data processing Leak detection Leakage Leaks Machine learning Methods Queries Query processing Visibility |
title | Detection of Data Leaks through Large Scale Distributed Query Processing using Machine Learning |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T05%3A05%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Detection%20of%20Data%20Leaks%20through%20Large%20Scale%20Distributed%20Query%20Processing%20using%20Machine%20Learning&rft.jtitle=International%20journal%20of%20advanced%20computer%20science%20&%20applications&rft.au=MVSV,%20Kiranmai&rft.date=2021&rft.volume=12&rft.issue=12&rft.issn=2158-107X&rft.eissn=2156-5570&rft_id=info:doi/10.14569/IJACSA.2021.0121237&rft_dat=%3Cproquest_cross%3E2655113401%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2655113401&rft_id=info:pmid/&rfr_iscdi=true |