Effective Job Execution in Hadoop Over Authorized Deduplicated Data
Existing Hadoop treats every job as an independent job and destroys metadata of preceding jobs. As every job is independent, again and again it has to read data from all Data Nodes. Moreover relationships between specific jobs are also not getting checked. Lack of Specific user identities creation a...
Gespeichert in:
Veröffentlicht in: | Webology 2020-12, Vol.17 (2), p.430-444 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 444 |
---|---|
container_issue | 2 |
container_start_page | 430 |
container_title | Webology |
container_volume | 17 |
creator | Thanekar, Sachin Arun Subrahmanyam, K. Bagwan, A.B. |
description | Existing Hadoop treats every job as an independent job and destroys metadata of preceding jobs. As every job is independent, again and again it has to read data from all Data Nodes. Moreover relationships between specific jobs are also not getting checked. Lack of Specific user identities creation and forming groups, managing user credentials are the weaknesses of HDFS. Due to which overall performance of Hadoop becomes very poor. So there is a need to improve the Hadoop performance by reusing metadata, better space management, better task execution by checking deduplication and securing data with access rights specification. In our proposed system, task deduplication technique is used. It checks the similarity between jobs by checking block ids. Job metadata and data locality details are stored on Name Node which results in better execution of job. Metadata of executed jobs is preserved. Thus by preserving job metadata re computations time can be saved. Experimental results show that there is an improvement in job execution time, reduced storage space. Thus, improves Hadoop performance. |
doi_str_mv | 10.14704/WEB/V17I2/WEB17043 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2475936860</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2475936860</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1873-b63275dbf776b568ad7c33fee0bb25760b3ea047048fea14163577663829c8853</originalsourceid><addsrcrecordid>eNpNkMtOwzAQRS0EEqXwBWwssQ61M_Gjy1ICLarUDa-dZTu2SFXi4CQV8PUkLUis5s7oaEZzELqk5JpmgmSTl_xm8kzFMh0S7SdwhEZUAEuolK_H__IpOmuaDSFZlhIyQvPce2fbcufwQzA4_3S2a8tQ4bLCC12EUOP1zkU869q3EMtvV-BbV3T1trS6HRrd6nN04vW2cRe_dYye7vLH-SJZre-X89kqsVQKSAyHVLDCeCG4YVzqQlgA7xwxJmWCEwNOk-Ed6Z2mGeXAepSDTKdWSgZjdHXYW8fw0bmmVZvQxao_qdJMsClwyUlPwYGyMTRNdF7VsXzX8UtRova2VO9I7W2pP1vwA8HFXEU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2475936860</pqid></control><display><type>article</type><title>Effective Job Execution in Hadoop Over Authorized Deduplicated Data</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Thanekar, Sachin Arun ; Subrahmanyam, K. ; Bagwan, A.B.</creator><creatorcontrib>Thanekar, Sachin Arun ; Subrahmanyam, K. ; Bagwan, A.B.</creatorcontrib><description>Existing Hadoop treats every job as an independent job and destroys metadata of preceding jobs. As every job is independent, again and again it has to read data from all Data Nodes. Moreover relationships between specific jobs are also not getting checked. Lack of Specific user identities creation and forming groups, managing user credentials are the weaknesses of HDFS. Due to which overall performance of Hadoop becomes very poor. So there is a need to improve the Hadoop performance by reusing metadata, better space management, better task execution by checking deduplication and securing data with access rights specification. In our proposed system, task deduplication technique is used. It checks the similarity between jobs by checking block ids. Job metadata and data locality details are stored on Name Node which results in better execution of job. Metadata of executed jobs is preserved. Thus by preserving job metadata re computations time can be saved. Experimental results show that there is an improvement in job execution time, reduced storage space. Thus, improves Hadoop performance.</description><identifier>ISSN: 1735-188X</identifier><identifier>EISSN: 1735-188X</identifier><identifier>DOI: 10.14704/WEB/V17I2/WEB17043</identifier><language>eng</language><publisher>Tehran: Dr. Alireza Noruzi, University of Tehran, Department of Library and Information Science</publisher><subject>Algorithms ; Datasets ; Employment ; Metadata</subject><ispartof>Webology, 2020-12, Vol.17 (2), p.430-444</ispartof><rights>Copyright Dr. Alireza Noruzi, University of Tehran, Department of Library and Information Science Dec 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27923,27924</link.rule.ids></links><search><creatorcontrib>Thanekar, Sachin Arun</creatorcontrib><creatorcontrib>Subrahmanyam, K.</creatorcontrib><creatorcontrib>Bagwan, A.B.</creatorcontrib><title>Effective Job Execution in Hadoop Over Authorized Deduplicated Data</title><title>Webology</title><description>Existing Hadoop treats every job as an independent job and destroys metadata of preceding jobs. As every job is independent, again and again it has to read data from all Data Nodes. Moreover relationships between specific jobs are also not getting checked. Lack of Specific user identities creation and forming groups, managing user credentials are the weaknesses of HDFS. Due to which overall performance of Hadoop becomes very poor. So there is a need to improve the Hadoop performance by reusing metadata, better space management, better task execution by checking deduplication and securing data with access rights specification. In our proposed system, task deduplication technique is used. It checks the similarity between jobs by checking block ids. Job metadata and data locality details are stored on Name Node which results in better execution of job. Metadata of executed jobs is preserved. Thus by preserving job metadata re computations time can be saved. Experimental results show that there is an improvement in job execution time, reduced storage space. Thus, improves Hadoop performance.</description><subject>Algorithms</subject><subject>Datasets</subject><subject>Employment</subject><subject>Metadata</subject><issn>1735-188X</issn><issn>1735-188X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNpNkMtOwzAQRS0EEqXwBWwssQ61M_Gjy1ICLarUDa-dZTu2SFXi4CQV8PUkLUis5s7oaEZzELqk5JpmgmSTl_xm8kzFMh0S7SdwhEZUAEuolK_H__IpOmuaDSFZlhIyQvPce2fbcufwQzA4_3S2a8tQ4bLCC12EUOP1zkU869q3EMtvV-BbV3T1trS6HRrd6nN04vW2cRe_dYye7vLH-SJZre-X89kqsVQKSAyHVLDCeCG4YVzqQlgA7xwxJmWCEwNOk-Ed6Z2mGeXAepSDTKdWSgZjdHXYW8fw0bmmVZvQxao_qdJMsClwyUlPwYGyMTRNdF7VsXzX8UtRova2VO9I7W2pP1vwA8HFXEU</recordid><startdate>20201201</startdate><enddate>20201201</enddate><creator>Thanekar, Sachin Arun</creator><creator>Subrahmanyam, K.</creator><creator>Bagwan, A.B.</creator><general>Dr. Alireza Noruzi, University of Tehran, Department of Library and Information Science</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>CNYFK</scope><scope>CWDGH</scope><scope>DWQXO</scope><scope>E3H</scope><scope>F2A</scope><scope>M1O</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope></search><sort><creationdate>20201201</creationdate><title>Effective Job Execution in Hadoop Over Authorized Deduplicated Data</title><author>Thanekar, Sachin Arun ; Subrahmanyam, K. ; Bagwan, A.B.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1873-b63275dbf776b568ad7c33fee0bb25760b3ea047048fea14163577663829c8853</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Datasets</topic><topic>Employment</topic><topic>Metadata</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Thanekar, Sachin Arun</creatorcontrib><creatorcontrib>Subrahmanyam, K.</creatorcontrib><creatorcontrib>Bagwan, A.B.</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Social Science Premium Collection</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>Library & Information Science Collection</collection><collection>Middle East & Africa Database</collection><collection>ProQuest Central Korea</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>Library Science Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Webology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Thanekar, Sachin Arun</au><au>Subrahmanyam, K.</au><au>Bagwan, A.B.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Effective Job Execution in Hadoop Over Authorized Deduplicated Data</atitle><jtitle>Webology</jtitle><date>2020-12-01</date><risdate>2020</risdate><volume>17</volume><issue>2</issue><spage>430</spage><epage>444</epage><pages>430-444</pages><issn>1735-188X</issn><eissn>1735-188X</eissn><abstract>Existing Hadoop treats every job as an independent job and destroys metadata of preceding jobs. As every job is independent, again and again it has to read data from all Data Nodes. Moreover relationships between specific jobs are also not getting checked. Lack of Specific user identities creation and forming groups, managing user credentials are the weaknesses of HDFS. Due to which overall performance of Hadoop becomes very poor. So there is a need to improve the Hadoop performance by reusing metadata, better space management, better task execution by checking deduplication and securing data with access rights specification. In our proposed system, task deduplication technique is used. It checks the similarity between jobs by checking block ids. Job metadata and data locality details are stored on Name Node which results in better execution of job. Metadata of executed jobs is preserved. Thus by preserving job metadata re computations time can be saved. Experimental results show that there is an improvement in job execution time, reduced storage space. Thus, improves Hadoop performance.</abstract><cop>Tehran</cop><pub>Dr. Alireza Noruzi, University of Tehran, Department of Library and Information Science</pub><doi>10.14704/WEB/V17I2/WEB17043</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1735-188X |
ispartof | Webology, 2020-12, Vol.17 (2), p.430-444 |
issn | 1735-188X 1735-188X |
language | eng |
recordid | cdi_proquest_journals_2475936860 |
source | Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals |
subjects | Algorithms Datasets Employment Metadata |
title | Effective Job Execution in Hadoop Over Authorized Deduplicated Data |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T20%3A36%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Effective%20Job%20Execution%20in%20Hadoop%20Over%20Authorized%20Deduplicated%20Data&rft.jtitle=Webology&rft.au=Thanekar,%20Sachin%20Arun&rft.date=2020-12-01&rft.volume=17&rft.issue=2&rft.spage=430&rft.epage=444&rft.pages=430-444&rft.issn=1735-188X&rft.eissn=1735-188X&rft_id=info:doi/10.14704/WEB/V17I2/WEB17043&rft_dat=%3Cproquest_cross%3E2475936860%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2475936860&rft_id=info:pmid/&rfr_iscdi=true |