Improving Hate Speech Detection of Urdu Tweets Using Sentiment Analysis

Sentiment Analysis is a technique that is being used abundantly nowadays for customer reviews analysis, popularity analysis of electoral candidates, hate speech detection and similar applications. Sentiment analysis on tweets encounters challenges such as highly skewed classes, high dimensional feat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2021, Vol.9, p.84296-84305
Hauptverfasser: Ali, Muhammad Z., Ehsan-Ul-Haq, Rauf, Sahar, Javed, Kashif, Hussain, Sarmad
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 84305
container_issue
container_start_page 84296
container_title IEEE access
container_volume 9
creator Ali, Muhammad Z.
Ehsan-Ul-Haq
Rauf, Sahar
Javed, Kashif
Hussain, Sarmad
description Sentiment Analysis is a technique that is being used abundantly nowadays for customer reviews analysis, popularity analysis of electoral candidates, hate speech detection and similar applications. Sentiment analysis on tweets encounters challenges such as highly skewed classes, high dimensional feature vectors and highly sparse data. In this study, we have analyzed the improvement achieved by successively addressing these problems in order to determine their severity for sentiment analysis of tweets. Firstly, we prepared a comprehensive data set consisting of Urdu Tweets for sentiment analysis-based hate speech detection. To improve the performance of the sentiment classifier, we employed dynamic stop words filtering, Variable Global Feature Selection Scheme (VGFSS) and Synthetic Minority Optimization Technique (SMOTE) to handle the sparsity, dimensionality and class imbalance problems respectively. We used two machine learning algorithms i.e., Support Vector Machines (SVM) and Multinomial Naïve Bayes' (MNB) for investigating performance in our experiments. Our results show that addressing class skew along with alleviating the high dimensionality problem brings about the maximum improvement in the overall performance of the sentiment analysis-based hate speech detection.
doi_str_mv 10.1109/ACCESS.2021.3087827
format Article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_proquest_journals_2541468182</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9449874</ieee_id><doaj_id>oai_doaj_org_article_dc350467f15c48aa9fd44156b7911797</doaj_id><sourcerecordid>2541468182</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-59f1c6acae520f074c9232dba613108f3fb43a3a3b483a24a52d479c077af6813</originalsourceid><addsrcrecordid>eNpNUVtrwjAULmODifMX-BLYc12uTfIozqkg7KH6HNI0cRVtXRI39u8XV5HlQE44fBdyviwbIzhBCMqX6Ww2L8sJhhhNCBRcYH6XDTAqZE4YKe7_vR-zUQh7mI5II8YH2WJ1PPnuq2l3YKmjBeXJWvMBXm20JjZdCzoHtr4-g823tTGAbbhAS9vG5pguMG314Sc04Sl7cPoQ7Ojah9n2bb6ZLfP1-2I1m65zQ6GIOZMOmUIbbRmGDnJqJCa4rnSBCILCEVdRolNVVBCNqWa4plwayLl2hUBkmK163brTe3XyzVH7H9XpRv0NOr9T2sfGHKyqDWGQFtwhZqjQWrqaUsSKikuEuORJ67nXShv4PNsQ1b47-_ShoDCjiCY_gROK9CjjuxC8dTdXBNUlANUHoC4BqGsAiTXuWY219saQlErBKfkFGmF_fw</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2541468182</pqid></control><display><type>article</type><title>Improving Hate Speech Detection of Urdu Tweets Using Sentiment Analysis</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Ali, Muhammad Z. ; Ehsan-Ul-Haq ; Rauf, Sahar ; Javed, Kashif ; Hussain, Sarmad</creator><creatorcontrib>Ali, Muhammad Z. ; Ehsan-Ul-Haq ; Rauf, Sahar ; Javed, Kashif ; Hussain, Sarmad</creatorcontrib><description>Sentiment Analysis is a technique that is being used abundantly nowadays for customer reviews analysis, popularity analysis of electoral candidates, hate speech detection and similar applications. Sentiment analysis on tweets encounters challenges such as highly skewed classes, high dimensional feature vectors and highly sparse data. In this study, we have analyzed the improvement achieved by successively addressing these problems in order to determine their severity for sentiment analysis of tweets. Firstly, we prepared a comprehensive data set consisting of Urdu Tweets for sentiment analysis-based hate speech detection. To improve the performance of the sentiment classifier, we employed dynamic stop words filtering, Variable Global Feature Selection Scheme (VGFSS) and Synthetic Minority Optimization Technique (SMOTE) to handle the sparsity, dimensionality and class imbalance problems respectively. We used two machine learning algorithms i.e., Support Vector Machines (SVM) and Multinomial Naïve Bayes' (MNB) for investigating performance in our experiments. Our results show that addressing class skew along with alleviating the high dimensionality problem brings about the maximum improvement in the overall performance of the sentiment analysis-based hate speech detection.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2021.3087827</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Blogs ; Data mining ; data sparsity ; Feature extraction ; Hate speech ; high-dimensional feature vector ; highly skewed classes ; Machine learning ; Machine learning algorithms ; Optimization ; Optimization techniques ; Performance enhancement ; Semantics ; Sentiment analysis ; Social networking (online) ; Support vector machines ; Vocabulary</subject><ispartof>IEEE access, 2021, Vol.9, p.84296-84305</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-59f1c6acae520f074c9232dba613108f3fb43a3a3b483a24a52d479c077af6813</citedby><cites>FETCH-LOGICAL-c408t-59f1c6acae520f074c9232dba613108f3fb43a3a3b483a24a52d479c077af6813</cites><orcidid>0000-0003-2558-1772</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9449874$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2095,4009,27612,27902,27903,27904,54912</link.rule.ids></links><search><creatorcontrib>Ali, Muhammad Z.</creatorcontrib><creatorcontrib>Ehsan-Ul-Haq</creatorcontrib><creatorcontrib>Rauf, Sahar</creatorcontrib><creatorcontrib>Javed, Kashif</creatorcontrib><creatorcontrib>Hussain, Sarmad</creatorcontrib><title>Improving Hate Speech Detection of Urdu Tweets Using Sentiment Analysis</title><title>IEEE access</title><addtitle>Access</addtitle><description>Sentiment Analysis is a technique that is being used abundantly nowadays for customer reviews analysis, popularity analysis of electoral candidates, hate speech detection and similar applications. Sentiment analysis on tweets encounters challenges such as highly skewed classes, high dimensional feature vectors and highly sparse data. In this study, we have analyzed the improvement achieved by successively addressing these problems in order to determine their severity for sentiment analysis of tweets. Firstly, we prepared a comprehensive data set consisting of Urdu Tweets for sentiment analysis-based hate speech detection. To improve the performance of the sentiment classifier, we employed dynamic stop words filtering, Variable Global Feature Selection Scheme (VGFSS) and Synthetic Minority Optimization Technique (SMOTE) to handle the sparsity, dimensionality and class imbalance problems respectively. We used two machine learning algorithms i.e., Support Vector Machines (SVM) and Multinomial Naïve Bayes' (MNB) for investigating performance in our experiments. Our results show that addressing class skew along with alleviating the high dimensionality problem brings about the maximum improvement in the overall performance of the sentiment analysis-based hate speech detection.</description><subject>Algorithms</subject><subject>Blogs</subject><subject>Data mining</subject><subject>data sparsity</subject><subject>Feature extraction</subject><subject>Hate speech</subject><subject>high-dimensional feature vector</subject><subject>highly skewed classes</subject><subject>Machine learning</subject><subject>Machine learning algorithms</subject><subject>Optimization</subject><subject>Optimization techniques</subject><subject>Performance enhancement</subject><subject>Semantics</subject><subject>Sentiment analysis</subject><subject>Social networking (online)</subject><subject>Support vector machines</subject><subject>Vocabulary</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUVtrwjAULmODifMX-BLYc12uTfIozqkg7KH6HNI0cRVtXRI39u8XV5HlQE44fBdyviwbIzhBCMqX6Ww2L8sJhhhNCBRcYH6XDTAqZE4YKe7_vR-zUQh7mI5II8YH2WJ1PPnuq2l3YKmjBeXJWvMBXm20JjZdCzoHtr4-g823tTGAbbhAS9vG5pguMG314Sc04Sl7cPoQ7Ojah9n2bb6ZLfP1-2I1m65zQ6GIOZMOmUIbbRmGDnJqJCa4rnSBCILCEVdRolNVVBCNqWa4plwayLl2hUBkmK163brTe3XyzVH7H9XpRv0NOr9T2sfGHKyqDWGQFtwhZqjQWrqaUsSKikuEuORJ67nXShv4PNsQ1b47-_ShoDCjiCY_gROK9CjjuxC8dTdXBNUlANUHoC4BqGsAiTXuWY219saQlErBKfkFGmF_fw</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Ali, Muhammad Z.</creator><creator>Ehsan-Ul-Haq</creator><creator>Rauf, Sahar</creator><creator>Javed, Kashif</creator><creator>Hussain, Sarmad</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-2558-1772</orcidid></search><sort><creationdate>2021</creationdate><title>Improving Hate Speech Detection of Urdu Tweets Using Sentiment Analysis</title><author>Ali, Muhammad Z. ; Ehsan-Ul-Haq ; Rauf, Sahar ; Javed, Kashif ; Hussain, Sarmad</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-59f1c6acae520f074c9232dba613108f3fb43a3a3b483a24a52d479c077af6813</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Blogs</topic><topic>Data mining</topic><topic>data sparsity</topic><topic>Feature extraction</topic><topic>Hate speech</topic><topic>high-dimensional feature vector</topic><topic>highly skewed classes</topic><topic>Machine learning</topic><topic>Machine learning algorithms</topic><topic>Optimization</topic><topic>Optimization techniques</topic><topic>Performance enhancement</topic><topic>Semantics</topic><topic>Sentiment analysis</topic><topic>Social networking (online)</topic><topic>Support vector machines</topic><topic>Vocabulary</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ali, Muhammad Z.</creatorcontrib><creatorcontrib>Ehsan-Ul-Haq</creatorcontrib><creatorcontrib>Rauf, Sahar</creatorcontrib><creatorcontrib>Javed, Kashif</creatorcontrib><creatorcontrib>Hussain, Sarmad</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ali, Muhammad Z.</au><au>Ehsan-Ul-Haq</au><au>Rauf, Sahar</au><au>Javed, Kashif</au><au>Hussain, Sarmad</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improving Hate Speech Detection of Urdu Tweets Using Sentiment Analysis</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2021</date><risdate>2021</risdate><volume>9</volume><spage>84296</spage><epage>84305</epage><pages>84296-84305</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Sentiment Analysis is a technique that is being used abundantly nowadays for customer reviews analysis, popularity analysis of electoral candidates, hate speech detection and similar applications. Sentiment analysis on tweets encounters challenges such as highly skewed classes, high dimensional feature vectors and highly sparse data. In this study, we have analyzed the improvement achieved by successively addressing these problems in order to determine their severity for sentiment analysis of tweets. Firstly, we prepared a comprehensive data set consisting of Urdu Tweets for sentiment analysis-based hate speech detection. To improve the performance of the sentiment classifier, we employed dynamic stop words filtering, Variable Global Feature Selection Scheme (VGFSS) and Synthetic Minority Optimization Technique (SMOTE) to handle the sparsity, dimensionality and class imbalance problems respectively. We used two machine learning algorithms i.e., Support Vector Machines (SVM) and Multinomial Naïve Bayes' (MNB) for investigating performance in our experiments. Our results show that addressing class skew along with alleviating the high dimensionality problem brings about the maximum improvement in the overall performance of the sentiment analysis-based hate speech detection.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2021.3087827</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0003-2558-1772</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2021, Vol.9, p.84296-84305
issn 2169-3536
2169-3536
language eng
recordid cdi_proquest_journals_2541468182
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals
subjects Algorithms
Blogs
Data mining
data sparsity
Feature extraction
Hate speech
high-dimensional feature vector
highly skewed classes
Machine learning
Machine learning algorithms
Optimization
Optimization techniques
Performance enhancement
Semantics
Sentiment analysis
Social networking (online)
Support vector machines
Vocabulary
title Improving Hate Speech Detection of Urdu Tweets Using Sentiment Analysis
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T19%3A42%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improving%20Hate%20Speech%20Detection%20of%20Urdu%20Tweets%20Using%20Sentiment%20Analysis&rft.jtitle=IEEE%20access&rft.au=Ali,%20Muhammad%20Z.&rft.date=2021&rft.volume=9&rft.spage=84296&rft.epage=84305&rft.pages=84296-84305&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2021.3087827&rft_dat=%3Cproquest_doaj_%3E2541468182%3C/proquest_doaj_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2541468182&rft_id=info:pmid/&rft_ieee_id=9449874&rft_doaj_id=oai_doaj_org_article_dc350467f15c48aa9fd44156b7911797&rfr_iscdi=true