Fake News Detection in Arabic Tweets during the COVID-19 Pandemic

In March 2020, the World Health Organization declared the COVID-19 outbreak to be a pandemic. Soon af-terwards, people began sharing millions of posts on social media without considering their reliability and truthfulness. While there has been extensive research on COVID-19 in the English lan-guage,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of advanced computer science & applications 2021, Vol.12 (6)
Hauptverfasser: Mahlous, Ahmed Redha, Al-Laith, Ali
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 6
container_start_page
container_title International journal of advanced computer science & applications
container_volume 12
creator Mahlous, Ahmed Redha
Al-Laith, Ali
description In March 2020, the World Health Organization declared the COVID-19 outbreak to be a pandemic. Soon af-terwards, people began sharing millions of posts on social media without considering their reliability and truthfulness. While there has been extensive research on COVID-19 in the English lan-guage, there is a lack of research on the subject in Arabic. In this paper, we address the problem of detecting fake news surrounding COVID-19 in Arabic tweets. We collected more than seven million Arabic tweets related to the corona virus pandemic from January 2020 to August 2020 using the trending hashtags during the time of pandemic. We relied on two fact-checkers: the France-Press Agency and the Saudi Anti-Rumors Authority to extract a list of keywords related to the misinformation and fake news topics. A small corpus was extracted from the collected tweets and manually annotated into fake or genuine classes. We used a set of features extracted from tweet contents to train a set of machine learning classifiers. The manually annotated corpus was used as a baseline to build a system for automatically detecting fake news from Arabic text. Classification of the manually annotated dataset achieved an F1-score of 87.8% using Logistic Regression (LR) as a classifier with the n-gram-level Term Frequency-Inverse Document Frequency (TF-IDF) as a feature, and a 93.3% F1-score on the automatically annotated dataset using the same classifier with count vector feature. The introduced system and datasets could help governments, decision-makers, and the public judge the credibility of information published on social media during the COVID-19 pandemic.
doi_str_mv 10.14569/IJACSA.2021.0120691
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2655116427</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2655116427</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2401-5f77617a79355f9f747067c8349ee0460d8f0a705c420ed228b2d823b0fbf51b3</originalsourceid><addsrcrecordid>eNotkE1LAzEYhIMoWGr_gYeA561vvjfHZWu1UqxgFW8hm010q92tyZbiv7e2ncvMYZiBB6FrAmPChdS3s8eifCnGFCgZA6EgNTlDA0qEzIRQcH7IeUZAvV-iUUor2ItpKnM2QMXUfnn85HcJT3zvXd90LW5aXERbNQ4vd973Cdfb2LQfuP_0uFy8zSYZ0fjZtrVfN-4KXQT7nfzo5EP0Or1blg_ZfHE_K4t55igHkomglCTKKs2ECDoorkAqlzOuvQcuoc4DWAXCcQq-pjSvaJ1TVkGogiAVG6Kb4-4mdj9bn3qz6rax3V8aKoUgRHKq9i1-bLnYpRR9MJvYrG38NQTMgZc58jL_vMyJF_sDTUVaPg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2655116427</pqid></control><display><type>article</type><title>Fake News Detection in Arabic Tweets during the COVID-19 Pandemic</title><source>EZB-FREE-00999 freely available EZB journals</source><creator>Mahlous, Ahmed Redha ; Al-Laith, Ali</creator><creatorcontrib>Mahlous, Ahmed Redha ; Al-Laith, Ali</creatorcontrib><description>In March 2020, the World Health Organization declared the COVID-19 outbreak to be a pandemic. Soon af-terwards, people began sharing millions of posts on social media without considering their reliability and truthfulness. While there has been extensive research on COVID-19 in the English lan-guage, there is a lack of research on the subject in Arabic. In this paper, we address the problem of detecting fake news surrounding COVID-19 in Arabic tweets. We collected more than seven million Arabic tweets related to the corona virus pandemic from January 2020 to August 2020 using the trending hashtags during the time of pandemic. We relied on two fact-checkers: the France-Press Agency and the Saudi Anti-Rumors Authority to extract a list of keywords related to the misinformation and fake news topics. A small corpus was extracted from the collected tweets and manually annotated into fake or genuine classes. We used a set of features extracted from tweet contents to train a set of machine learning classifiers. The manually annotated corpus was used as a baseline to build a system for automatically detecting fake news from Arabic text. Classification of the manually annotated dataset achieved an F1-score of 87.8% using Logistic Regression (LR) as a classifier with the n-gram-level Term Frequency-Inverse Document Frequency (TF-IDF) as a feature, and a 93.3% F1-score on the automatically annotated dataset using the same classifier with count vector feature. The introduced system and datasets could help governments, decision-makers, and the public judge the credibility of information published on social media during the COVID-19 pandemic.</description><identifier>ISSN: 2158-107X</identifier><identifier>EISSN: 2156-5570</identifier><identifier>DOI: 10.14569/IJACSA.2021.0120691</identifier><language>eng</language><publisher>West Yorkshire: Science and Information (SAI) Organization Limited</publisher><subject>Annotations ; Classifiers ; Coronaviruses ; COVID-19 ; Datasets ; Decision making ; Digital media ; Feature extraction ; Machine learning ; News ; Pandemics ; Social networks ; Viral diseases</subject><ispartof>International journal of advanced computer science &amp; applications, 2021, Vol.12 (6)</ispartof><rights>2021. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c2401-5f77617a79355f9f747067c8349ee0460d8f0a705c420ed228b2d823b0fbf51b3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,4024,27923,27924,27925</link.rule.ids></links><search><creatorcontrib>Mahlous, Ahmed Redha</creatorcontrib><creatorcontrib>Al-Laith, Ali</creatorcontrib><title>Fake News Detection in Arabic Tweets during the COVID-19 Pandemic</title><title>International journal of advanced computer science &amp; applications</title><description>In March 2020, the World Health Organization declared the COVID-19 outbreak to be a pandemic. Soon af-terwards, people began sharing millions of posts on social media without considering their reliability and truthfulness. While there has been extensive research on COVID-19 in the English lan-guage, there is a lack of research on the subject in Arabic. In this paper, we address the problem of detecting fake news surrounding COVID-19 in Arabic tweets. We collected more than seven million Arabic tweets related to the corona virus pandemic from January 2020 to August 2020 using the trending hashtags during the time of pandemic. We relied on two fact-checkers: the France-Press Agency and the Saudi Anti-Rumors Authority to extract a list of keywords related to the misinformation and fake news topics. A small corpus was extracted from the collected tweets and manually annotated into fake or genuine classes. We used a set of features extracted from tweet contents to train a set of machine learning classifiers. The manually annotated corpus was used as a baseline to build a system for automatically detecting fake news from Arabic text. Classification of the manually annotated dataset achieved an F1-score of 87.8% using Logistic Regression (LR) as a classifier with the n-gram-level Term Frequency-Inverse Document Frequency (TF-IDF) as a feature, and a 93.3% F1-score on the automatically annotated dataset using the same classifier with count vector feature. The introduced system and datasets could help governments, decision-makers, and the public judge the credibility of information published on social media during the COVID-19 pandemic.</description><subject>Annotations</subject><subject>Classifiers</subject><subject>Coronaviruses</subject><subject>COVID-19</subject><subject>Datasets</subject><subject>Decision making</subject><subject>Digital media</subject><subject>Feature extraction</subject><subject>Machine learning</subject><subject>News</subject><subject>Pandemics</subject><subject>Social networks</subject><subject>Viral diseases</subject><issn>2158-107X</issn><issn>2156-5570</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>8G5</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>GUQSH</sourceid><sourceid>M2O</sourceid><recordid>eNotkE1LAzEYhIMoWGr_gYeA561vvjfHZWu1UqxgFW8hm010q92tyZbiv7e2ncvMYZiBB6FrAmPChdS3s8eifCnGFCgZA6EgNTlDA0qEzIRQcH7IeUZAvV-iUUor2ItpKnM2QMXUfnn85HcJT3zvXd90LW5aXERbNQ4vd973Cdfb2LQfuP_0uFy8zSYZ0fjZtrVfN-4KXQT7nfzo5EP0Or1blg_ZfHE_K4t55igHkomglCTKKs2ECDoorkAqlzOuvQcuoc4DWAXCcQq-pjSvaJ1TVkGogiAVG6Kb4-4mdj9bn3qz6rax3V8aKoUgRHKq9i1-bLnYpRR9MJvYrG38NQTMgZc58jL_vMyJF_sDTUVaPg</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Mahlous, Ahmed Redha</creator><creator>Al-Laith, Ali</creator><general>Science and Information (SAI) Organization Limited</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7XB</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>COVID</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>2021</creationdate><title>Fake News Detection in Arabic Tweets during the COVID-19 Pandemic</title><author>Mahlous, Ahmed Redha ; Al-Laith, Ali</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2401-5f77617a79355f9f747067c8349ee0460d8f0a705c420ed228b2d823b0fbf51b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Annotations</topic><topic>Classifiers</topic><topic>Coronaviruses</topic><topic>COVID-19</topic><topic>Datasets</topic><topic>Decision making</topic><topic>Digital media</topic><topic>Feature extraction</topic><topic>Machine learning</topic><topic>News</topic><topic>Pandemics</topic><topic>Social networks</topic><topic>Viral diseases</topic><toplevel>online_resources</toplevel><creatorcontrib>Mahlous, Ahmed Redha</creatorcontrib><creatorcontrib>Al-Laith, Ali</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>Coronavirus Research Database</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Research Library</collection><collection>Research Library (Corporate)</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>International journal of advanced computer science &amp; applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mahlous, Ahmed Redha</au><au>Al-Laith, Ali</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Fake News Detection in Arabic Tweets during the COVID-19 Pandemic</atitle><jtitle>International journal of advanced computer science &amp; applications</jtitle><date>2021</date><risdate>2021</risdate><volume>12</volume><issue>6</issue><issn>2158-107X</issn><eissn>2156-5570</eissn><abstract>In March 2020, the World Health Organization declared the COVID-19 outbreak to be a pandemic. Soon af-terwards, people began sharing millions of posts on social media without considering their reliability and truthfulness. While there has been extensive research on COVID-19 in the English lan-guage, there is a lack of research on the subject in Arabic. In this paper, we address the problem of detecting fake news surrounding COVID-19 in Arabic tweets. We collected more than seven million Arabic tweets related to the corona virus pandemic from January 2020 to August 2020 using the trending hashtags during the time of pandemic. We relied on two fact-checkers: the France-Press Agency and the Saudi Anti-Rumors Authority to extract a list of keywords related to the misinformation and fake news topics. A small corpus was extracted from the collected tweets and manually annotated into fake or genuine classes. We used a set of features extracted from tweet contents to train a set of machine learning classifiers. The manually annotated corpus was used as a baseline to build a system for automatically detecting fake news from Arabic text. Classification of the manually annotated dataset achieved an F1-score of 87.8% using Logistic Regression (LR) as a classifier with the n-gram-level Term Frequency-Inverse Document Frequency (TF-IDF) as a feature, and a 93.3% F1-score on the automatically annotated dataset using the same classifier with count vector feature. The introduced system and datasets could help governments, decision-makers, and the public judge the credibility of information published on social media during the COVID-19 pandemic.</abstract><cop>West Yorkshire</cop><pub>Science and Information (SAI) Organization Limited</pub><doi>10.14569/IJACSA.2021.0120691</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2158-107X
ispartof International journal of advanced computer science & applications, 2021, Vol.12 (6)
issn 2158-107X
2156-5570
language eng
recordid cdi_proquest_journals_2655116427
source EZB-FREE-00999 freely available EZB journals
subjects Annotations
Classifiers
Coronaviruses
COVID-19
Datasets
Decision making
Digital media
Feature extraction
Machine learning
News
Pandemics
Social networks
Viral diseases
title Fake News Detection in Arabic Tweets during the COVID-19 Pandemic
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T02%3A32%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Fake%20News%20Detection%20in%20Arabic%20Tweets%20during%20the%20COVID-19%20Pandemic&rft.jtitle=International%20journal%20of%20advanced%20computer%20science%20&%20applications&rft.au=Mahlous,%20Ahmed%20Redha&rft.date=2021&rft.volume=12&rft.issue=6&rft.issn=2158-107X&rft.eissn=2156-5570&rft_id=info:doi/10.14569/IJACSA.2021.0120691&rft_dat=%3Cproquest_cross%3E2655116427%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2655116427&rft_id=info:pmid/&rfr_iscdi=true