Effective Job Execution in Hadoop Over Authorized Deduplicated Data

Existing Hadoop treats every job as an independent job and destroys metadata of preceding jobs. As every job is independent, again and again it has to read data from all Data Nodes. Moreover relationships between specific jobs are also not getting checked. Lack of Specific user identities creation a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Webology 2020-12, Vol.17 (2), p.430-444
Hauptverfasser:	Thanekar, Sachin Arun, Subrahmanyam, K., Bagwan, A.B.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Datasets Employment Metadata
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	444
container_issue	2
container_start_page	430
container_title	Webology
container_volume	17
creator	Thanekar, Sachin Arun Subrahmanyam, K. Bagwan, A.B.
description	Existing Hadoop treats every job as an independent job and destroys metadata of preceding jobs. As every job is independent, again and again it has to read data from all Data Nodes. Moreover relationships between specific jobs are also not getting checked. Lack of Specific user identities creation and forming groups, managing user credentials are the weaknesses of HDFS. Due to which overall performance of Hadoop becomes very poor. So there is a need to improve the Hadoop performance by reusing metadata, better space management, better task execution by checking deduplication and securing data with access rights specification. In our proposed system, task deduplication technique is used. It checks the similarity between jobs by checking block ids. Job metadata and data locality details are stored on Name Node which results in better execution of job. Metadata of executed jobs is preserved. Thus by preserving job metadata re computations time can be saved. Experimental results show that there is an improvement in job execution time, reduced storage space. Thus, improves Hadoop performance.
doi_str_mv	10.14704/WEB/V17I2/WEB17043
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2475936860</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2475936860</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1873-b63275dbf776b568ad7c33fee0bb25760b3ea047048fea14163577663829c8853</originalsourceid><addsrcrecordid>eNpNkMtOwzAQRS0EEqXwBWwssQ61M_Gjy1ICLarUDa-dZTu2SFXi4CQV8PUkLUis5s7oaEZzELqk5JpmgmSTl_xm8kzFMh0S7SdwhEZUAEuolK_H__IpOmuaDSFZlhIyQvPce2fbcufwQzA4_3S2a8tQ4bLCC12EUOP1zkU869q3EMtvV-BbV3T1trS6HRrd6nN04vW2cRe_dYye7vLH-SJZre-X89kqsVQKSAyHVLDCeCG4YVzqQlgA7xwxJmWCEwNOk-Ed6Z2mGeXAepSDTKdWSgZjdHXYW8fw0bmmVZvQxao_qdJMsClwyUlPwYGyMTRNdF7VsXzX8UtRova2VO9I7W2pP1vwA8HFXEU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2475936860</pqid></control><display><type>article</type><title>Effective Job Execution in Hadoop Over Authorized Deduplicated Data</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Thanekar, Sachin Arun ; Subrahmanyam, K. ; Bagwan, A.B.</creator><creatorcontrib>Thanekar, Sachin Arun ; Subrahmanyam, K. ; Bagwan, A.B.</creatorcontrib><description>Existing Hadoop treats every job as an independent job and destroys metadata of preceding jobs. As every job is independent, again and again it has to read data from all Data Nodes. Moreover relationships between specific jobs are also not getting checked. Lack of Specific user identities creation and forming groups, managing user credentials are the weaknesses of HDFS. Due to which overall performance of Hadoop becomes very poor. So there is a need to improve the Hadoop performance by reusing metadata, better space management, better task execution by checking deduplication and securing data with access rights specification. In our proposed system, task deduplication technique is used. It checks the similarity between jobs by checking block ids. Job metadata and data locality details are stored on Name Node which results in better execution of job. Metadata of executed jobs is preserved. Thus by preserving job metadata re computations time can be saved. Experimental results show that there is an improvement in job execution time, reduced storage space. Thus, improves Hadoop performance.</description><identifier>ISSN: 1735-188X</identifier><identifier>EISSN: 1735-188X</identifier><identifier>DOI: 10.14704/WEB/V17I2/WEB17043</identifier><language>eng</language><publisher>Tehran: Dr. Alireza Noruzi, University of Tehran, Department of Library and Information Science</publisher><subject>Algorithms ; Datasets ; Employment ; Metadata</subject><ispartof>Webology, 2020-12, Vol.17 (2), p.430-444</ispartof><rights>Copyright Dr. Alireza Noruzi, University of Tehran, Department of Library and Information Science Dec 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27923,27924</link.rule.ids></links><search><creatorcontrib>Thanekar, Sachin Arun</creatorcontrib><creatorcontrib>Subrahmanyam, K.</creatorcontrib><creatorcontrib>Bagwan, A.B.</creatorcontrib><title>Effective Job Execution in Hadoop Over Authorized Deduplicated Data</title><title>Webology</title><description>Existing Hadoop treats every job as an independent job and destroys metadata of preceding jobs. As every job is independent, again and again it has to read data from all Data Nodes. Moreover relationships between specific jobs are also not getting checked. Lack of Specific user identities creation and forming groups, managing user credentials are the weaknesses of HDFS. Due to which overall performance of Hadoop becomes very poor. So there is a need to improve the Hadoop performance by reusing metadata, better space management, better task execution by checking deduplication and securing data with access rights specification. In our proposed system, task deduplication technique is used. It checks the similarity between jobs by checking block ids. Job metadata and data locality details are stored on Name Node which results in better execution of job. Metadata of executed jobs is preserved. Thus by preserving job metadata re computations time can be saved. Experimental results show that there is an improvement in job execution time, reduced storage space. Thus, improves Hadoop performance.</description><subject>Algorithms</subject><subject>Datasets</subject><subject>Employment</subject><subject>Metadata</subject><issn>1735-188X</issn><issn>1735-188X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNpNkMtOwzAQRS0EEqXwBWwssQ61M_Gjy1ICLarUDa-dZTu2SFXi4CQV8PUkLUis5s7oaEZzELqk5JpmgmSTl_xm8kzFMh0S7SdwhEZUAEuolK_H__IpOmuaDSFZlhIyQvPce2fbcufwQzA4_3S2a8tQ4bLCC12EUOP1zkU869q3EMtvV-BbV3T1trS6HRrd6nN04vW2cRe_dYye7vLH-SJZre-X89kqsVQKSAyHVLDCeCG4YVzqQlgA7xwxJmWCEwNOk-Ed6Z2mGeXAepSDTKdWSgZjdHXYW8fw0bmmVZvQxao_qdJMsClwyUlPwYGyMTRNdF7VsXzX8UtRova2VO9I7W2pP1vwA8HFXEU</recordid><startdate>20201201</startdate><enddate>20201201</enddate><creator>Thanekar, Sachin Arun</creator><creator>Subrahmanyam, K.</creator><creator>Bagwan, A.B.</creator><general>Dr. Alireza Noruzi, University of Tehran, Department of Library and Information Science</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>CNYFK</scope><scope>CWDGH</scope><scope>DWQXO</scope><scope>E3H</scope><scope>F2A</scope><scope>M1O</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope></search><sort><creationdate>20201201</creationdate><title>Effective Job Execution in Hadoop Over Authorized Deduplicated Data</title><author>Thanekar, Sachin Arun ; Subrahmanyam, K. ; Bagwan, A.B.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1873-b63275dbf776b568ad7c33fee0bb25760b3ea047048fea14163577663829c8853</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Datasets</topic><topic>Employment</topic><topic>Metadata</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Thanekar, Sachin Arun</creatorcontrib><creatorcontrib>Subrahmanyam, K.</creatorcontrib><creatorcontrib>Bagwan, A.B.</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Social Science Premium Collection</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>Library & Information Science Collection</collection><collection>Middle East & Africa Database</collection><collection>ProQuest Central Korea</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>Library Science Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Webology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Thanekar, Sachin Arun</au><au>Subrahmanyam, K.</au><au>Bagwan, A.B.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Effective Job Execution in Hadoop Over Authorized Deduplicated Data</atitle><jtitle>Webology</jtitle><date>2020-12-01</date><risdate>2020</risdate><volume>17</volume><issue>2</issue><spage>430</spage><epage>444</epage><pages>430-444</pages><issn>1735-188X</issn><eissn>1735-188X</eissn><abstract>Existing Hadoop treats every job as an independent job and destroys metadata of preceding jobs. As every job is independent, again and again it has to read data from all Data Nodes. Moreover relationships between specific jobs are also not getting checked. Lack of Specific user identities creation and forming groups, managing user credentials are the weaknesses of HDFS. Due to which overall performance of Hadoop becomes very poor. So there is a need to improve the Hadoop performance by reusing metadata, better space management, better task execution by checking deduplication and securing data with access rights specification. In our proposed system, task deduplication technique is used. It checks the similarity between jobs by checking block ids. Job metadata and data locality details are stored on Name Node which results in better execution of job. Metadata of executed jobs is preserved. Thus by preserving job metadata re computations time can be saved. Experimental results show that there is an improvement in job execution time, reduced storage space. Thus, improves Hadoop performance.</abstract><cop>Tehran</cop><pub>Dr. Alireza Noruzi, University of Tehran, Department of Library and Information Science</pub><doi>10.14704/WEB/V17I2/WEB17043</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1735-188X
ispartof	Webology, 2020-12, Vol.17 (2), p.430-444
issn	1735-188X 1735-188X
language	eng
recordid	cdi_proquest_journals_2475936860
source	Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	Algorithms Datasets Employment Metadata
title	Effective Job Execution in Hadoop Over Authorized Deduplicated Data
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T20%3A36%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Effective%20Job%20Execution%20in%20Hadoop%20Over%20Authorized%20Deduplicated%20Data&rft.jtitle=Webology&rft.au=Thanekar,%20Sachin%20Arun&rft.date=2020-12-01&rft.volume=17&rft.issue=2&rft.spage=430&rft.epage=444&rft.pages=430-444&rft.issn=1735-188X&rft.eissn=1735-188X&rft_id=info:doi/10.14704/WEB/V17I2/WEB17043&rft_dat=%3Cproquest_cross%3E2475936860%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2475936860&rft_id=info:pmid/&rfr_iscdi=true