Social Media and Stock Market Prediction: A Big Data Approach

Big data is the collection of large datasets from traditional and digital sources to identify trends and patterns. The quantity and variety of computer data are growing exponentially for many reasons. For example, retailers are building vast databases of customer sales activity. Organizations are wo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers, materials & continua materials & continua, 2021, Vol.67 (2), p.2569-2583
Hauptverfasser: Javed Awan, Mazhar, Shafry Mohd Rahim, Mohd, Nobanee, Haitham, Munawar, Ashna, Yasin, Awais, Mohd Zain Azlanmz, Azlan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2583
container_issue 2
container_start_page 2569
container_title Computers, materials & continua
container_volume 67
creator Javed Awan, Mazhar
Shafry Mohd Rahim, Mohd
Nobanee, Haitham
Munawar, Ashna
Yasin, Awais
Mohd Zain Azlanmz, Azlan
description Big data is the collection of large datasets from traditional and digital sources to identify trends and patterns. The quantity and variety of computer data are growing exponentially for many reasons. For example, retailers are building vast databases of customer sales activity. Organizations are working on logistics financial services, and public social media are sharing a vast quantity of sentiments related to sales price and products. Challenges of big data include volume and variety in both structured and unstructured data. In this paper, we implemented several machine learning models through Spark MLlib using PySpark, which is scalable, fast, easily integrated with other tools, and has better performance than the traditional models. We studied the stocks of 10 top companies, whose data include historical stock prices, with MLlib models such as linear regression, generalized linear regression, random forest, and decision tree. We implemented naive Bayes and logistic regression classification models. Experimental results suggest that linear regression, random forest, and generalized linear regression provide an accuracy of 80%–98%. The experimental results of the decision tree did not well predict share price movements in the stock market.
doi_str_mv 10.32604/cmc.2021.014253
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2691782649</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2691782649</sourcerecordid><originalsourceid>FETCH-LOGICAL-c313t-ecce85742b8556cd99cdbf1afd5d14993fffbccbd0218218a0176006a3bde54e3</originalsourceid><addsrcrecordid>eNpNkEtLxDAUhYMoOI7uXQZcd8y7jeCijk-YQWF0HdKbRDuPtqadhf_eaF0IF-7lcjjn8CF0TsmMM0XEJexgxgijM0IFk_wATagUKmOMqcN_9zE66fs1IVxxTSboetVCbbd46V1tsW0cXg0tbPDSxo0f8EtMfxjqtrnCJb6p3_GtHSwuuy62Fj5O0VGw296f_e0peru_e50_Zovnh6d5uciAUz5kHsAXMhesKqRU4LQGVwVqg5OOCq15CKECqFyqX6SxhOaKEGV55bwUnk_RxeibYj_3vh_Mut3HJkUapjTNC6aETioyqiC2fR99MF2sdzZ-GUrMLySTIJkfSGaExL8BsedY0w</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2691782649</pqid></control><display><type>article</type><title>Social Media and Stock Market Prediction: A Big Data Approach</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Javed Awan, Mazhar ; Shafry Mohd Rahim, Mohd ; Nobanee, Haitham ; Munawar, Ashna ; Yasin, Awais ; Mohd Zain Azlanmz, Azlan</creator><creatorcontrib>Javed Awan, Mazhar ; Shafry Mohd Rahim, Mohd ; Nobanee, Haitham ; Munawar, Ashna ; Yasin, Awais ; Mohd Zain Azlanmz, Azlan</creatorcontrib><description>Big data is the collection of large datasets from traditional and digital sources to identify trends and patterns. The quantity and variety of computer data are growing exponentially for many reasons. For example, retailers are building vast databases of customer sales activity. Organizations are working on logistics financial services, and public social media are sharing a vast quantity of sentiments related to sales price and products. Challenges of big data include volume and variety in both structured and unstructured data. In this paper, we implemented several machine learning models through Spark MLlib using PySpark, which is scalable, fast, easily integrated with other tools, and has better performance than the traditional models. We studied the stocks of 10 top companies, whose data include historical stock prices, with MLlib models such as linear regression, generalized linear regression, random forest, and decision tree. We implemented naive Bayes and logistic regression classification models. Experimental results suggest that linear regression, random forest, and generalized linear regression provide an accuracy of 80%–98%. The experimental results of the decision tree did not well predict share price movements in the stock market.</description><identifier>ISSN: 1546-2226</identifier><identifier>ISSN: 1546-2218</identifier><identifier>EISSN: 1546-2226</identifier><identifier>DOI: 10.32604/cmc.2021.014253</identifier><language>eng</language><publisher>Henderson: Tech Science Press</publisher><subject>Big Data ; Customer services ; Decision trees ; Digital media ; Logistics ; Machine learning ; Regression ; Regression analysis ; Sales ; Securities markets ; Social networks ; Unstructured data</subject><ispartof>Computers, materials &amp; continua, 2021, Vol.67 (2), p.2569-2583</ispartof><rights>2021. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c313t-ecce85742b8556cd99cdbf1afd5d14993fffbccbd0218218a0176006a3bde54e3</citedby><cites>FETCH-LOGICAL-c313t-ecce85742b8556cd99cdbf1afd5d14993fffbccbd0218218a0176006a3bde54e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,777,781,4010,27904,27905,27906</link.rule.ids></links><search><creatorcontrib>Javed Awan, Mazhar</creatorcontrib><creatorcontrib>Shafry Mohd Rahim, Mohd</creatorcontrib><creatorcontrib>Nobanee, Haitham</creatorcontrib><creatorcontrib>Munawar, Ashna</creatorcontrib><creatorcontrib>Yasin, Awais</creatorcontrib><creatorcontrib>Mohd Zain Azlanmz, Azlan</creatorcontrib><title>Social Media and Stock Market Prediction: A Big Data Approach</title><title>Computers, materials &amp; continua</title><description>Big data is the collection of large datasets from traditional and digital sources to identify trends and patterns. The quantity and variety of computer data are growing exponentially for many reasons. For example, retailers are building vast databases of customer sales activity. Organizations are working on logistics financial services, and public social media are sharing a vast quantity of sentiments related to sales price and products. Challenges of big data include volume and variety in both structured and unstructured data. In this paper, we implemented several machine learning models through Spark MLlib using PySpark, which is scalable, fast, easily integrated with other tools, and has better performance than the traditional models. We studied the stocks of 10 top companies, whose data include historical stock prices, with MLlib models such as linear regression, generalized linear regression, random forest, and decision tree. We implemented naive Bayes and logistic regression classification models. Experimental results suggest that linear regression, random forest, and generalized linear regression provide an accuracy of 80%–98%. The experimental results of the decision tree did not well predict share price movements in the stock market.</description><subject>Big Data</subject><subject>Customer services</subject><subject>Decision trees</subject><subject>Digital media</subject><subject>Logistics</subject><subject>Machine learning</subject><subject>Regression</subject><subject>Regression analysis</subject><subject>Sales</subject><subject>Securities markets</subject><subject>Social networks</subject><subject>Unstructured data</subject><issn>1546-2226</issn><issn>1546-2218</issn><issn>1546-2226</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNpNkEtLxDAUhYMoOI7uXQZcd8y7jeCijk-YQWF0HdKbRDuPtqadhf_eaF0IF-7lcjjn8CF0TsmMM0XEJexgxgijM0IFk_wATagUKmOMqcN_9zE66fs1IVxxTSboetVCbbd46V1tsW0cXg0tbPDSxo0f8EtMfxjqtrnCJb6p3_GtHSwuuy62Fj5O0VGw296f_e0peru_e50_Zovnh6d5uciAUz5kHsAXMhesKqRU4LQGVwVqg5OOCq15CKECqFyqX6SxhOaKEGV55bwUnk_RxeibYj_3vh_Mut3HJkUapjTNC6aETioyqiC2fR99MF2sdzZ-GUrMLySTIJkfSGaExL8BsedY0w</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Javed Awan, Mazhar</creator><creator>Shafry Mohd Rahim, Mohd</creator><creator>Nobanee, Haitham</creator><creator>Munawar, Ashna</creator><creator>Yasin, Awais</creator><creator>Mohd Zain Azlanmz, Azlan</creator><general>Tech Science Press</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope></search><sort><creationdate>2021</creationdate><title>Social Media and Stock Market Prediction: A Big Data Approach</title><author>Javed Awan, Mazhar ; Shafry Mohd Rahim, Mohd ; Nobanee, Haitham ; Munawar, Ashna ; Yasin, Awais ; Mohd Zain Azlanmz, Azlan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c313t-ecce85742b8556cd99cdbf1afd5d14993fffbccbd0218218a0176006a3bde54e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Big Data</topic><topic>Customer services</topic><topic>Decision trees</topic><topic>Digital media</topic><topic>Logistics</topic><topic>Machine learning</topic><topic>Regression</topic><topic>Regression analysis</topic><topic>Sales</topic><topic>Securities markets</topic><topic>Social networks</topic><topic>Unstructured data</topic><toplevel>online_resources</toplevel><creatorcontrib>Javed Awan, Mazhar</creatorcontrib><creatorcontrib>Shafry Mohd Rahim, Mohd</creatorcontrib><creatorcontrib>Nobanee, Haitham</creatorcontrib><creatorcontrib>Munawar, Ashna</creatorcontrib><creatorcontrib>Yasin, Awais</creatorcontrib><creatorcontrib>Mohd Zain Azlanmz, Azlan</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Computers, materials &amp; continua</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Javed Awan, Mazhar</au><au>Shafry Mohd Rahim, Mohd</au><au>Nobanee, Haitham</au><au>Munawar, Ashna</au><au>Yasin, Awais</au><au>Mohd Zain Azlanmz, Azlan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Social Media and Stock Market Prediction: A Big Data Approach</atitle><jtitle>Computers, materials &amp; continua</jtitle><date>2021</date><risdate>2021</risdate><volume>67</volume><issue>2</issue><spage>2569</spage><epage>2583</epage><pages>2569-2583</pages><issn>1546-2226</issn><issn>1546-2218</issn><eissn>1546-2226</eissn><abstract>Big data is the collection of large datasets from traditional and digital sources to identify trends and patterns. The quantity and variety of computer data are growing exponentially for many reasons. For example, retailers are building vast databases of customer sales activity. Organizations are working on logistics financial services, and public social media are sharing a vast quantity of sentiments related to sales price and products. Challenges of big data include volume and variety in both structured and unstructured data. In this paper, we implemented several machine learning models through Spark MLlib using PySpark, which is scalable, fast, easily integrated with other tools, and has better performance than the traditional models. We studied the stocks of 10 top companies, whose data include historical stock prices, with MLlib models such as linear regression, generalized linear regression, random forest, and decision tree. We implemented naive Bayes and logistic regression classification models. Experimental results suggest that linear regression, random forest, and generalized linear regression provide an accuracy of 80%–98%. The experimental results of the decision tree did not well predict share price movements in the stock market.</abstract><cop>Henderson</cop><pub>Tech Science Press</pub><doi>10.32604/cmc.2021.014253</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1546-2226
ispartof Computers, materials & continua, 2021, Vol.67 (2), p.2569-2583
issn 1546-2226
1546-2218
1546-2226
language eng
recordid cdi_proquest_journals_2691782649
source Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Big Data
Customer services
Decision trees
Digital media
Logistics
Machine learning
Regression
Regression analysis
Sales
Securities markets
Social networks
Unstructured data
title Social Media and Stock Market Prediction: A Big Data Approach
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T10%3A32%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Social%20Media%20and%20Stock%20Market%20Prediction:%20A%20Big%20Data%20Approach&rft.jtitle=Computers,%20materials%20&%20continua&rft.au=Javed%20Awan,%20Mazhar&rft.date=2021&rft.volume=67&rft.issue=2&rft.spage=2569&rft.epage=2583&rft.pages=2569-2583&rft.issn=1546-2226&rft.eissn=1546-2226&rft_id=info:doi/10.32604/cmc.2021.014253&rft_dat=%3Cproquest_cross%3E2691782649%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2691782649&rft_id=info:pmid/&rfr_iscdi=true