Twitter Benchmark Dataset for Arabic Sentiment Analysis

Sentiment classification is the most rising research areas of sentiment analysis and text mining, especially with the massive amount of opinions available on social media. Recent results and efforts have demonstrated that there is no single strategy can mutually accomplish the best prediction perfor...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of modern education and computer science 2019-01, Vol.11 (1), p.33-38
Hauptverfasser:	Gamal, Donia, Alfonse, Marco, M.El-Horbaty, El-Sayed, M.Salem, Abdel-Badeeh
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Arabic language Benchmarks Data mining Datasets Dialect Studies Dialects Digital media Machine learning Regression analysis Semitic Languages Sentiment analysis Social networks
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	38
container_issue	1
container_start_page	33
container_title	International journal of modern education and computer science
container_volume	11
creator	Gamal, Donia Alfonse, Marco M.El-Horbaty, El-Sayed M.Salem, Abdel-Badeeh
description	Sentiment classification is the most rising research areas of sentiment analysis and text mining, especially with the massive amount of opinions available on social media. Recent results and efforts have demonstrated that there is no single strategy can mutually accomplish the best prediction performance on various datasets. There is a lack of existing researches to Arabic sentiment analysis compared to English sentiment analysis, because of the unique nature and difficulty of the Arabic language which leads to shortage in Arabic dataset used in sentiment analysis. An Arabic benchmark dataset is proposed in this paper for sentiment analysis showing the gathering methodology of the most recent tweets in different Arabic dialects. This dataset includes more than 151,000 different opinions in variant Arabic dialects which labeled into two balanced classes, namely, positive and negative. Different machine learning algorithms are applied on this dataset including the ridge regression which gives the highest accuracy of 99.90%.
doi_str_mv	10.5815/ijmecs.2019.01.04
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2193196171</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2193196171</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2314-73a734d219d2e44fde17c94512970375cd4633dde93a175b8a4f856b273e010b3</originalsourceid><addsrcrecordid>eNo9kM1LxDAQxYMouKz7B3greG6dyUfTHOv6CQseXMFbSJMUW7ftmnSR_e_tUnEOM3N4PN77EXKNkIkCxW3Tdt7GjAKqDDADfkYWFKRIAeXH-f-f4yVZxdjCNLniFNSCyO1PM44-JHe-t5-dCV_JvRlN9GNSDyEpg6kam7z5fmy6aSVlb3bH2MQrclGbXfSrv7sk748P2_Vzunl9elmXm9RShjyVzEjGHUXlqOe8dh6lVVwgVRKYFNbxnDHnvGIGpagKw-tC5BWVzANCxZbkZvbdh-H74OOo2-EQphBRT6YMVY4SJxXOKhuGGIOv9T40U5mjRtAnRHpGpE-INKAGzn4B04ZY1g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2193196171</pqid></control><display><type>article</type><title>Twitter Benchmark Dataset for Arabic Sentiment Analysis</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Gamal, Donia ; Alfonse, Marco ; M.El-Horbaty, El-Sayed ; M.Salem, Abdel-Badeeh</creator><creatorcontrib>Gamal, Donia ; Alfonse, Marco ; M.El-Horbaty, El-Sayed ; M.Salem, Abdel-Badeeh ; Computer Science Department, Faculty of computer and information sciences, Ain Shams University, Cairo, Egypt</creatorcontrib><description>Sentiment classification is the most rising research areas of sentiment analysis and text mining, especially with the massive amount of opinions available on social media. Recent results and efforts have demonstrated that there is no single strategy can mutually accomplish the best prediction performance on various datasets. There is a lack of existing researches to Arabic sentiment analysis compared to English sentiment analysis, because of the unique nature and difficulty of the Arabic language which leads to shortage in Arabic dataset used in sentiment analysis. An Arabic benchmark dataset is proposed in this paper for sentiment analysis showing the gathering methodology of the most recent tweets in different Arabic dialects. This dataset includes more than 151,000 different opinions in variant Arabic dialects which labeled into two balanced classes, namely, positive and negative. Different machine learning algorithms are applied on this dataset including the ridge regression which gives the highest accuracy of 99.90%.</description><identifier>ISSN: 2075-0161</identifier><identifier>EISSN: 2075-017X</identifier><identifier>DOI: 10.5815/ijmecs.2019.01.04</identifier><language>eng</language><publisher>Hong Kong: Modern Education and Computer Science Press</publisher><subject>Algorithms ; Arabic language ; Benchmarks ; Data mining ; Datasets ; Dialect Studies ; Dialects ; Digital media ; Machine learning ; Regression analysis ; Semitic Languages ; Sentiment analysis ; Social networks</subject><ispartof>International journal of modern education and computer science, 2019-01, Vol.11 (1), p.33-38</ispartof><rights>2019. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the associated terms available at http://www.mecs-press.org/ijcnis/terms.html</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c2314-73a734d219d2e44fde17c94512970375cd4633dde93a175b8a4f856b273e010b3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,778,782,27913,27914</link.rule.ids></links><search><creatorcontrib>Gamal, Donia</creatorcontrib><creatorcontrib>Alfonse, Marco</creatorcontrib><creatorcontrib>M.El-Horbaty, El-Sayed</creatorcontrib><creatorcontrib>M.Salem, Abdel-Badeeh</creatorcontrib><creatorcontrib>Computer Science Department, Faculty of computer and information sciences, Ain Shams University, Cairo, Egypt</creatorcontrib><title>Twitter Benchmark Dataset for Arabic Sentiment Analysis</title><title>International journal of modern education and computer science</title><description>Sentiment classification is the most rising research areas of sentiment analysis and text mining, especially with the massive amount of opinions available on social media. Recent results and efforts have demonstrated that there is no single strategy can mutually accomplish the best prediction performance on various datasets. There is a lack of existing researches to Arabic sentiment analysis compared to English sentiment analysis, because of the unique nature and difficulty of the Arabic language which leads to shortage in Arabic dataset used in sentiment analysis. An Arabic benchmark dataset is proposed in this paper for sentiment analysis showing the gathering methodology of the most recent tweets in different Arabic dialects. This dataset includes more than 151,000 different opinions in variant Arabic dialects which labeled into two balanced classes, namely, positive and negative. Different machine learning algorithms are applied on this dataset including the ridge regression which gives the highest accuracy of 99.90%.</description><subject>Algorithms</subject><subject>Arabic language</subject><subject>Benchmarks</subject><subject>Data mining</subject><subject>Datasets</subject><subject>Dialect Studies</subject><subject>Dialects</subject><subject>Digital media</subject><subject>Machine learning</subject><subject>Regression analysis</subject><subject>Semitic Languages</subject><subject>Sentiment analysis</subject><subject>Social networks</subject><issn>2075-0161</issn><issn>2075-017X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNo9kM1LxDAQxYMouKz7B3greG6dyUfTHOv6CQseXMFbSJMUW7ftmnSR_e_tUnEOM3N4PN77EXKNkIkCxW3Tdt7GjAKqDDADfkYWFKRIAeXH-f-f4yVZxdjCNLniFNSCyO1PM44-JHe-t5-dCV_JvRlN9GNSDyEpg6kam7z5fmy6aSVlb3bH2MQrclGbXfSrv7sk748P2_Vzunl9elmXm9RShjyVzEjGHUXlqOe8dh6lVVwgVRKYFNbxnDHnvGIGpagKw-tC5BWVzANCxZbkZvbdh-H74OOo2-EQphBRT6YMVY4SJxXOKhuGGIOv9T40U5mjRtAnRHpGpE-INKAGzn4B04ZY1g</recordid><startdate>20190101</startdate><enddate>20190101</enddate><creator>Gamal, Donia</creator><creator>Alfonse, Marco</creator><creator>M.El-Horbaty, El-Sayed</creator><creator>M.Salem, Abdel-Badeeh</creator><general>Modern Education and Computer Science Press</general><scope>AAYXX</scope><scope>CITATION</scope><scope>0-V</scope><scope>3V.</scope><scope>7XB</scope><scope>88B</scope><scope>8AL</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BVBZV</scope><scope>CCPQU</scope><scope>CJNVE</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>M0N</scope><scope>M0P</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEDU</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>20190101</creationdate><title>Twitter Benchmark Dataset for Arabic Sentiment Analysis</title><author>Gamal, Donia ; Alfonse, Marco ; M.El-Horbaty, El-Sayed ; M.Salem, Abdel-Badeeh</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2314-73a734d219d2e44fde17c94512970375cd4633dde93a175b8a4f856b273e010b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>Arabic language</topic><topic>Benchmarks</topic><topic>Data mining</topic><topic>Datasets</topic><topic>Dialect Studies</topic><topic>Dialects</topic><topic>Digital media</topic><topic>Machine learning</topic><topic>Regression analysis</topic><topic>Semitic Languages</topic><topic>Sentiment analysis</topic><topic>Social networks</topic><toplevel>online_resources</toplevel><creatorcontrib>Gamal, Donia</creatorcontrib><creatorcontrib>Alfonse, Marco</creatorcontrib><creatorcontrib>M.El-Horbaty, El-Sayed</creatorcontrib><creatorcontrib>M.Salem, Abdel-Badeeh</creatorcontrib><creatorcontrib>Computer Science Department, Faculty of computer and information sciences, Ain Shams University, Cairo, Egypt</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Social Sciences Premium Collection</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Education Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Social Science Premium Collection</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>East & South Asia Database</collection><collection>ProQuest One Community College</collection><collection>Education Collection</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Computing Database</collection><collection>Education Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Education</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>International journal of modern education and computer science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Gamal, Donia</au><au>Alfonse, Marco</au><au>M.El-Horbaty, El-Sayed</au><au>M.Salem, Abdel-Badeeh</au><aucorp>Computer Science Department, Faculty of computer and information sciences, Ain Shams University, Cairo, Egypt</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Twitter Benchmark Dataset for Arabic Sentiment Analysis</atitle><jtitle>International journal of modern education and computer science</jtitle><date>2019-01-01</date><risdate>2019</risdate><volume>11</volume><issue>1</issue><spage>33</spage><epage>38</epage><pages>33-38</pages><issn>2075-0161</issn><eissn>2075-017X</eissn><abstract>Sentiment classification is the most rising research areas of sentiment analysis and text mining, especially with the massive amount of opinions available on social media. Recent results and efforts have demonstrated that there is no single strategy can mutually accomplish the best prediction performance on various datasets. There is a lack of existing researches to Arabic sentiment analysis compared to English sentiment analysis, because of the unique nature and difficulty of the Arabic language which leads to shortage in Arabic dataset used in sentiment analysis. An Arabic benchmark dataset is proposed in this paper for sentiment analysis showing the gathering methodology of the most recent tweets in different Arabic dialects. This dataset includes more than 151,000 different opinions in variant Arabic dialects which labeled into two balanced classes, namely, positive and negative. Different machine learning algorithms are applied on this dataset including the ridge regression which gives the highest accuracy of 99.90%.</abstract><cop>Hong Kong</cop><pub>Modern Education and Computer Science Press</pub><doi>10.5815/ijmecs.2019.01.04</doi><tpages>6</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2075-0161
ispartof	International journal of modern education and computer science, 2019-01, Vol.11 (1), p.33-38
issn	2075-0161 2075-017X
language	eng
recordid	cdi_proquest_journals_2193196171
source	Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	Algorithms Arabic language Benchmarks Data mining Datasets Dialect Studies Dialects Digital media Machine learning Regression analysis Semitic Languages Sentiment analysis Social networks
title	Twitter Benchmark Dataset for Arabic Sentiment Analysis
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T08%3A00%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Twitter%20Benchmark%20Dataset%20for%20Arabic%20Sentiment%20Analysis&rft.jtitle=International%20journal%20of%20modern%20education%20and%20computer%20science&rft.au=Gamal,%20Donia&rft.aucorp=Computer%20Science%20Department,%20Faculty%20of%20computer%20and%20information%20sciences,%20Ain%20Shams%20University,%20Cairo,%20Egypt&rft.date=2019-01-01&rft.volume=11&rft.issue=1&rft.spage=33&rft.epage=38&rft.pages=33-38&rft.issn=2075-0161&rft.eissn=2075-017X&rft_id=info:doi/10.5815/ijmecs.2019.01.04&rft_dat=%3Cproquest_cross%3E2193196171%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2193196171&rft_id=info:pmid/&rfr_iscdi=true