A Comparative Analysis of Hadoop and Spark Frameworks using Word Count Algorithm
With the advent of the Big Data explosion due to the Information Technology (IT) revolution during the last few decades, the need for processing and analyzing the data at low cost in minimum time has become immensely challenging. The field of Big Data analytics is driven by the demand to process Mac...
Gespeichert in:
Veröffentlicht in: | International journal of advanced computer science & applications 2021, Vol.12 (4) |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 4 |
container_start_page | |
container_title | International journal of advanced computer science & applications |
container_volume | 12 |
creator | BENLACHIMI, YASSINE EL, ABDELAZIZ LAHCEN, MOULAY |
description | With the advent of the Big Data explosion due to the Information Technology (IT) revolution during the last few decades, the need for processing and analyzing the data at low cost in minimum time has become immensely challenging. The field of Big Data analytics is driven by the demand to process Machine Learning (ML) data, real-time streaming data, and graphics processing. The most efficient solutions to Big Data analysis in a distributed environment are Hadoop and Spark administered by Apache, both these solutions are open-source data management frameworks and they allow to distribute and compute the large datasets across multiple clusters of computing nodes. This paper provides a comprehensive comparison between Apache Hadoop & Apache Spark in terms of efficiency, scalability, security, cost-effectiveness, and other parameters. It describes primary components of Hadoop and Spark frameworks to compare their performance. The major conclusion is that Spark is better in terms of scalability and speed for real-time streaming applications; whereas, Hadoop is more viable for applications dealing with bigger datasets. This case study evaluates the performance of various components of Hadoop-such, MapReduce, and Hadoop Distributed File System (HDFS) by applying it to the well-known Word Count algorithm to ascertain its efficacy in terms of storage and computational time. Subsequently, it also provides an analysis of how Spark’s in-line memory processing could reduce the computational time of the Word Count Algorithm. |
doi_str_mv | 10.14569/IJACSA.2021.0120495 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2655119528</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2655119528</sourcerecordid><originalsourceid>FETCH-LOGICAL-c391t-36aaa2660fa87a6a659d2c863add4f27d94792f2c2771cfc0a30caa6a70ad9e03</originalsourceid><addsrcrecordid>eNotkF1LwzAUhoMoOOb-gRcBrzvz0STNZSnOTQSFKXoXDkk7u61NTVpl_9667dycc_Gcl5cHoVtK5jQVUt-vnvJinc8ZYXROKCOpFhdowqiQiRCKXB7vLKFEfV6jWYxbMg7XTGZ8gl5zXPimgwB9_VPivIX9IdYR-wovwXnfYWgdXo_ADi8CNOWvD7uIh1i3G_zhgxvfh7bH-X7jQ91_NTfoqoJ9LGfnPUXvi4e3Ypk8vzyuivw5sVzTPuESAJiUpIJMgQQptGM2kxycSyumnE6VZhWzTClqK0uAEwsjqAg4XRI-RXen3C7476GMvdn6IYz1o2FSCEq1YNlIpSfKBh9jKCvThbqBcDCUmKM-c9Jn_vWZsz7-BwFBYs0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2655119528</pqid></control><display><type>article</type><title>A Comparative Analysis of Hadoop and Spark Frameworks using Word Count Algorithm</title><source>EZB-FREE-00999 freely available EZB journals</source><creator>BENLACHIMI, YASSINE ; EL, ABDELAZIZ ; LAHCEN, MOULAY</creator><creatorcontrib>BENLACHIMI, YASSINE ; EL, ABDELAZIZ ; LAHCEN, MOULAY</creatorcontrib><description>With the advent of the Big Data explosion due to the Information Technology (IT) revolution during the last few decades, the need for processing and analyzing the data at low cost in minimum time has become immensely challenging. The field of Big Data analytics is driven by the demand to process Machine Learning (ML) data, real-time streaming data, and graphics processing. The most efficient solutions to Big Data analysis in a distributed environment are Hadoop and Spark administered by Apache, both these solutions are open-source data management frameworks and they allow to distribute and compute the large datasets across multiple clusters of computing nodes. This paper provides a comprehensive comparison between Apache Hadoop & Apache Spark in terms of efficiency, scalability, security, cost-effectiveness, and other parameters. It describes primary components of Hadoop and Spark frameworks to compare their performance. The major conclusion is that Spark is better in terms of scalability and speed for real-time streaming applications; whereas, Hadoop is more viable for applications dealing with bigger datasets. This case study evaluates the performance of various components of Hadoop-such, MapReduce, and Hadoop Distributed File System (HDFS) by applying it to the well-known Word Count algorithm to ascertain its efficacy in terms of storage and computational time. Subsequently, it also provides an analysis of how Spark’s in-line memory processing could reduce the computational time of the Word Count Algorithm.</description><identifier>ISSN: 2158-107X</identifier><identifier>EISSN: 2156-5570</identifier><identifier>DOI: 10.14569/IJACSA.2021.0120495</identifier><language>eng</language><publisher>West Yorkshire: Science and Information (SAI) Organization Limited</publisher><subject>Algorithms ; Big Data ; Computational efficiency ; Computing time ; Cost analysis ; Data analysis ; Data management ; Datasets ; Information technology ; Machine learning ; Performance evaluation ; Real time</subject><ispartof>International journal of advanced computer science & applications, 2021, Vol.12 (4)</ispartof><rights>2021. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c391t-36aaa2660fa87a6a659d2c863add4f27d94792f2c2771cfc0a30caa6a70ad9e03</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,4009,27902,27903,27904</link.rule.ids></links><search><creatorcontrib>BENLACHIMI, YASSINE</creatorcontrib><creatorcontrib>EL, ABDELAZIZ</creatorcontrib><creatorcontrib>LAHCEN, MOULAY</creatorcontrib><title>A Comparative Analysis of Hadoop and Spark Frameworks using Word Count Algorithm</title><title>International journal of advanced computer science & applications</title><description>With the advent of the Big Data explosion due to the Information Technology (IT) revolution during the last few decades, the need for processing and analyzing the data at low cost in minimum time has become immensely challenging. The field of Big Data analytics is driven by the demand to process Machine Learning (ML) data, real-time streaming data, and graphics processing. The most efficient solutions to Big Data analysis in a distributed environment are Hadoop and Spark administered by Apache, both these solutions are open-source data management frameworks and they allow to distribute and compute the large datasets across multiple clusters of computing nodes. This paper provides a comprehensive comparison between Apache Hadoop & Apache Spark in terms of efficiency, scalability, security, cost-effectiveness, and other parameters. It describes primary components of Hadoop and Spark frameworks to compare their performance. The major conclusion is that Spark is better in terms of scalability and speed for real-time streaming applications; whereas, Hadoop is more viable for applications dealing with bigger datasets. This case study evaluates the performance of various components of Hadoop-such, MapReduce, and Hadoop Distributed File System (HDFS) by applying it to the well-known Word Count algorithm to ascertain its efficacy in terms of storage and computational time. Subsequently, it also provides an analysis of how Spark’s in-line memory processing could reduce the computational time of the Word Count Algorithm.</description><subject>Algorithms</subject><subject>Big Data</subject><subject>Computational efficiency</subject><subject>Computing time</subject><subject>Cost analysis</subject><subject>Data analysis</subject><subject>Data management</subject><subject>Datasets</subject><subject>Information technology</subject><subject>Machine learning</subject><subject>Performance evaluation</subject><subject>Real time</subject><issn>2158-107X</issn><issn>2156-5570</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNotkF1LwzAUhoMoOOb-gRcBrzvz0STNZSnOTQSFKXoXDkk7u61NTVpl_9667dycc_Gcl5cHoVtK5jQVUt-vnvJinc8ZYXROKCOpFhdowqiQiRCKXB7vLKFEfV6jWYxbMg7XTGZ8gl5zXPimgwB9_VPivIX9IdYR-wovwXnfYWgdXo_ADi8CNOWvD7uIh1i3G_zhgxvfh7bH-X7jQ91_NTfoqoJ9LGfnPUXvi4e3Ypk8vzyuivw5sVzTPuESAJiUpIJMgQQptGM2kxycSyumnE6VZhWzTClqK0uAEwsjqAg4XRI-RXen3C7476GMvdn6IYz1o2FSCEq1YNlIpSfKBh9jKCvThbqBcDCUmKM-c9Jn_vWZsz7-BwFBYs0</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>BENLACHIMI, YASSINE</creator><creator>EL, ABDELAZIZ</creator><creator>LAHCEN, MOULAY</creator><general>Science and Information (SAI) Organization Limited</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7XB</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>2021</creationdate><title>A Comparative Analysis of Hadoop and Spark Frameworks using Word Count Algorithm</title><author>BENLACHIMI, YASSINE ; EL, ABDELAZIZ ; LAHCEN, MOULAY</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c391t-36aaa2660fa87a6a659d2c863add4f27d94792f2c2771cfc0a30caa6a70ad9e03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Big Data</topic><topic>Computational efficiency</topic><topic>Computing time</topic><topic>Cost analysis</topic><topic>Data analysis</topic><topic>Data management</topic><topic>Datasets</topic><topic>Information technology</topic><topic>Machine learning</topic><topic>Performance evaluation</topic><topic>Real time</topic><toplevel>online_resources</toplevel><creatorcontrib>BENLACHIMI, YASSINE</creatorcontrib><creatorcontrib>EL, ABDELAZIZ</creatorcontrib><creatorcontrib>LAHCEN, MOULAY</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>International journal of advanced computer science & applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>BENLACHIMI, YASSINE</au><au>EL, ABDELAZIZ</au><au>LAHCEN, MOULAY</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Comparative Analysis of Hadoop and Spark Frameworks using Word Count Algorithm</atitle><jtitle>International journal of advanced computer science & applications</jtitle><date>2021</date><risdate>2021</risdate><volume>12</volume><issue>4</issue><issn>2158-107X</issn><eissn>2156-5570</eissn><abstract>With the advent of the Big Data explosion due to the Information Technology (IT) revolution during the last few decades, the need for processing and analyzing the data at low cost in minimum time has become immensely challenging. The field of Big Data analytics is driven by the demand to process Machine Learning (ML) data, real-time streaming data, and graphics processing. The most efficient solutions to Big Data analysis in a distributed environment are Hadoop and Spark administered by Apache, both these solutions are open-source data management frameworks and they allow to distribute and compute the large datasets across multiple clusters of computing nodes. This paper provides a comprehensive comparison between Apache Hadoop & Apache Spark in terms of efficiency, scalability, security, cost-effectiveness, and other parameters. It describes primary components of Hadoop and Spark frameworks to compare their performance. The major conclusion is that Spark is better in terms of scalability and speed for real-time streaming applications; whereas, Hadoop is more viable for applications dealing with bigger datasets. This case study evaluates the performance of various components of Hadoop-such, MapReduce, and Hadoop Distributed File System (HDFS) by applying it to the well-known Word Count algorithm to ascertain its efficacy in terms of storage and computational time. Subsequently, it also provides an analysis of how Spark’s in-line memory processing could reduce the computational time of the Word Count Algorithm.</abstract><cop>West Yorkshire</cop><pub>Science and Information (SAI) Organization Limited</pub><doi>10.14569/IJACSA.2021.0120495</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2158-107X |
ispartof | International journal of advanced computer science & applications, 2021, Vol.12 (4) |
issn | 2158-107X 2156-5570 |
language | eng |
recordid | cdi_proquest_journals_2655119528 |
source | EZB-FREE-00999 freely available EZB journals |
subjects | Algorithms Big Data Computational efficiency Computing time Cost analysis Data analysis Data management Datasets Information technology Machine learning Performance evaluation Real time |
title | A Comparative Analysis of Hadoop and Spark Frameworks using Word Count Algorithm |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T16%3A16%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Comparative%20Analysis%20of%20Hadoop%20and%20Spark%20Frameworks%20using%20Word%20Count%20Algorithm&rft.jtitle=International%20journal%20of%20advanced%20computer%20science%20&%20applications&rft.au=BENLACHIMI,%20YASSINE&rft.date=2021&rft.volume=12&rft.issue=4&rft.issn=2158-107X&rft.eissn=2156-5570&rft_id=info:doi/10.14569/IJACSA.2021.0120495&rft_dat=%3Cproquest_cross%3E2655119528%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2655119528&rft_id=info:pmid/&rfr_iscdi=true |