Effective Job Execution in Hadoop Over Authorized Deduplicated Data

Existing Hadoop treats every job as an independent job and destroys metadata of preceding jobs. As every job is independent, again and again it has to read data from all Data Nodes. Moreover relationships between specific jobs are also not getting checked. Lack of Specific user identities creation a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Webology 2020-12, Vol.17 (2), p.430-444
Hauptverfasser: Thanekar, Sachin Arun, Subrahmanyam, K., Bagwan, A.B.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 444
container_issue 2
container_start_page 430
container_title Webology
container_volume 17
creator Thanekar, Sachin Arun
Subrahmanyam, K.
Bagwan, A.B.
description Existing Hadoop treats every job as an independent job and destroys metadata of preceding jobs. As every job is independent, again and again it has to read data from all Data Nodes. Moreover relationships between specific jobs are also not getting checked. Lack of Specific user identities creation and forming groups, managing user credentials are the weaknesses of HDFS. Due to which overall performance of Hadoop becomes very poor. So there is a need to improve the Hadoop performance by reusing metadata, better space management, better task execution by checking deduplication and securing data with access rights specification. In our proposed system, task deduplication technique is used. It checks the similarity between jobs by checking block ids. Job metadata and data locality details are stored on Name Node which results in better execution of job. Metadata of executed jobs is preserved. Thus by preserving job metadata re computations time can be saved. Experimental results show that there is an improvement in job execution time, reduced storage space. Thus, improves Hadoop performance.
doi_str_mv 10.14704/WEB/V17I2/WEB17043
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2475936860</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2475936860</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1873-b63275dbf776b568ad7c33fee0bb25760b3ea047048fea14163577663829c8853</originalsourceid><addsrcrecordid>eNpNkMtOwzAQRS0EEqXwBWwssQ61M_Gjy1ICLarUDa-dZTu2SFXi4CQV8PUkLUis5s7oaEZzELqk5JpmgmSTl_xm8kzFMh0S7SdwhEZUAEuolK_H__IpOmuaDSFZlhIyQvPce2fbcufwQzA4_3S2a8tQ4bLCC12EUOP1zkU869q3EMtvV-BbV3T1trS6HRrd6nN04vW2cRe_dYye7vLH-SJZre-X89kqsVQKSAyHVLDCeCG4YVzqQlgA7xwxJmWCEwNOk-Ed6Z2mGeXAepSDTKdWSgZjdHXYW8fw0bmmVZvQxao_qdJMsClwyUlPwYGyMTRNdF7VsXzX8UtRova2VO9I7W2pP1vwA8HFXEU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2475936860</pqid></control><display><type>article</type><title>Effective Job Execution in Hadoop Over Authorized Deduplicated Data</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Thanekar, Sachin Arun ; Subrahmanyam, K. ; Bagwan, A.B.</creator><creatorcontrib>Thanekar, Sachin Arun ; Subrahmanyam, K. ; Bagwan, A.B.</creatorcontrib><description>Existing Hadoop treats every job as an independent job and destroys metadata of preceding jobs. As every job is independent, again and again it has to read data from all Data Nodes. Moreover relationships between specific jobs are also not getting checked. Lack of Specific user identities creation and forming groups, managing user credentials are the weaknesses of HDFS. Due to which overall performance of Hadoop becomes very poor. So there is a need to improve the Hadoop performance by reusing metadata, better space management, better task execution by checking deduplication and securing data with access rights specification. In our proposed system, task deduplication technique is used. It checks the similarity between jobs by checking block ids. Job metadata and data locality details are stored on Name Node which results in better execution of job. Metadata of executed jobs is preserved. Thus by preserving job metadata re computations time can be saved. Experimental results show that there is an improvement in job execution time, reduced storage space. Thus, improves Hadoop performance.</description><identifier>ISSN: 1735-188X</identifier><identifier>EISSN: 1735-188X</identifier><identifier>DOI: 10.14704/WEB/V17I2/WEB17043</identifier><language>eng</language><publisher>Tehran: Dr. Alireza Noruzi, University of Tehran, Department of Library and Information Science</publisher><subject>Algorithms ; Datasets ; Employment ; Metadata</subject><ispartof>Webology, 2020-12, Vol.17 (2), p.430-444</ispartof><rights>Copyright Dr. Alireza Noruzi, University of Tehran, Department of Library and Information Science Dec 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27923,27924</link.rule.ids></links><search><creatorcontrib>Thanekar, Sachin Arun</creatorcontrib><creatorcontrib>Subrahmanyam, K.</creatorcontrib><creatorcontrib>Bagwan, A.B.</creatorcontrib><title>Effective Job Execution in Hadoop Over Authorized Deduplicated Data</title><title>Webology</title><description>Existing Hadoop treats every job as an independent job and destroys metadata of preceding jobs. As every job is independent, again and again it has to read data from all Data Nodes. Moreover relationships between specific jobs are also not getting checked. Lack of Specific user identities creation and forming groups, managing user credentials are the weaknesses of HDFS. Due to which overall performance of Hadoop becomes very poor. So there is a need to improve the Hadoop performance by reusing metadata, better space management, better task execution by checking deduplication and securing data with access rights specification. In our proposed system, task deduplication technique is used. It checks the similarity between jobs by checking block ids. Job metadata and data locality details are stored on Name Node which results in better execution of job. Metadata of executed jobs is preserved. Thus by preserving job metadata re computations time can be saved. Experimental results show that there is an improvement in job execution time, reduced storage space. Thus, improves Hadoop performance.</description><subject>Algorithms</subject><subject>Datasets</subject><subject>Employment</subject><subject>Metadata</subject><issn>1735-188X</issn><issn>1735-188X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNpNkMtOwzAQRS0EEqXwBWwssQ61M_Gjy1ICLarUDa-dZTu2SFXi4CQV8PUkLUis5s7oaEZzELqk5JpmgmSTl_xm8kzFMh0S7SdwhEZUAEuolK_H__IpOmuaDSFZlhIyQvPce2fbcufwQzA4_3S2a8tQ4bLCC12EUOP1zkU869q3EMtvV-BbV3T1trS6HRrd6nN04vW2cRe_dYye7vLH-SJZre-X89kqsVQKSAyHVLDCeCG4YVzqQlgA7xwxJmWCEwNOk-Ed6Z2mGeXAepSDTKdWSgZjdHXYW8fw0bmmVZvQxao_qdJMsClwyUlPwYGyMTRNdF7VsXzX8UtRova2VO9I7W2pP1vwA8HFXEU</recordid><startdate>20201201</startdate><enddate>20201201</enddate><creator>Thanekar, Sachin Arun</creator><creator>Subrahmanyam, K.</creator><creator>Bagwan, A.B.</creator><general>Dr. Alireza Noruzi, University of Tehran, Department of Library and Information Science</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>CNYFK</scope><scope>CWDGH</scope><scope>DWQXO</scope><scope>E3H</scope><scope>F2A</scope><scope>M1O</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope></search><sort><creationdate>20201201</creationdate><title>Effective Job Execution in Hadoop Over Authorized Deduplicated Data</title><author>Thanekar, Sachin Arun ; Subrahmanyam, K. ; Bagwan, A.B.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1873-b63275dbf776b568ad7c33fee0bb25760b3ea047048fea14163577663829c8853</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Datasets</topic><topic>Employment</topic><topic>Metadata</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Thanekar, Sachin Arun</creatorcontrib><creatorcontrib>Subrahmanyam, K.</creatorcontrib><creatorcontrib>Bagwan, A.B.</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Social Science Premium Collection</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>Library &amp; Information Science Collection</collection><collection>Middle East &amp; Africa Database</collection><collection>ProQuest Central Korea</collection><collection>Library &amp; Information Sciences Abstracts (LISA)</collection><collection>Library &amp; Information Science Abstracts (LISA)</collection><collection>Library Science Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Webology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Thanekar, Sachin Arun</au><au>Subrahmanyam, K.</au><au>Bagwan, A.B.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Effective Job Execution in Hadoop Over Authorized Deduplicated Data</atitle><jtitle>Webology</jtitle><date>2020-12-01</date><risdate>2020</risdate><volume>17</volume><issue>2</issue><spage>430</spage><epage>444</epage><pages>430-444</pages><issn>1735-188X</issn><eissn>1735-188X</eissn><abstract>Existing Hadoop treats every job as an independent job and destroys metadata of preceding jobs. As every job is independent, again and again it has to read data from all Data Nodes. Moreover relationships between specific jobs are also not getting checked. Lack of Specific user identities creation and forming groups, managing user credentials are the weaknesses of HDFS. Due to which overall performance of Hadoop becomes very poor. So there is a need to improve the Hadoop performance by reusing metadata, better space management, better task execution by checking deduplication and securing data with access rights specification. In our proposed system, task deduplication technique is used. It checks the similarity between jobs by checking block ids. Job metadata and data locality details are stored on Name Node which results in better execution of job. Metadata of executed jobs is preserved. Thus by preserving job metadata re computations time can be saved. Experimental results show that there is an improvement in job execution time, reduced storage space. Thus, improves Hadoop performance.</abstract><cop>Tehran</cop><pub>Dr. Alireza Noruzi, University of Tehran, Department of Library and Information Science</pub><doi>10.14704/WEB/V17I2/WEB17043</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1735-188X
ispartof Webology, 2020-12, Vol.17 (2), p.430-444
issn 1735-188X
1735-188X
language eng
recordid cdi_proquest_journals_2475936860
source Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Algorithms
Datasets
Employment
Metadata
title Effective Job Execution in Hadoop Over Authorized Deduplicated Data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T20%3A36%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Effective%20Job%20Execution%20in%20Hadoop%20Over%20Authorized%20Deduplicated%20Data&rft.jtitle=Webology&rft.au=Thanekar,%20Sachin%20Arun&rft.date=2020-12-01&rft.volume=17&rft.issue=2&rft.spage=430&rft.epage=444&rft.pages=430-444&rft.issn=1735-188X&rft.eissn=1735-188X&rft_id=info:doi/10.14704/WEB/V17I2/WEB17043&rft_dat=%3Cproquest_cross%3E2475936860%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2475936860&rft_id=info:pmid/&rfr_iscdi=true