Improving Hate Speech Detection of Urdu Tweets Using Sentiment Analysis

Sentiment Analysis is a technique that is being used abundantly nowadays for customer reviews analysis, popularity analysis of electoral candidates, hate speech detection and similar applications. Sentiment analysis on tweets encounters challenges such as highly skewed classes, high dimensional feat...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2021, Vol.9, p.84296-84305
Hauptverfasser:	Ali, Muhammad Z., Ehsan-Ul-Haq, Rauf, Sahar, Javed, Kashif, Hussain, Sarmad
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Blogs Data mining data sparsity Feature extraction Hate speech high-dimensional feature vector highly skewed classes Machine learning Machine learning algorithms Optimization Optimization techniques Performance enhancement Semantics Sentiment analysis Social networking (online) Support vector machines Vocabulary
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	84305
container_issue
container_start_page	84296
container_title	IEEE access
container_volume	9
creator	Ali, Muhammad Z. Ehsan-Ul-Haq Rauf, Sahar Javed, Kashif Hussain, Sarmad
description	Sentiment Analysis is a technique that is being used abundantly nowadays for customer reviews analysis, popularity analysis of electoral candidates, hate speech detection and similar applications. Sentiment analysis on tweets encounters challenges such as highly skewed classes, high dimensional feature vectors and highly sparse data. In this study, we have analyzed the improvement achieved by successively addressing these problems in order to determine their severity for sentiment analysis of tweets. Firstly, we prepared a comprehensive data set consisting of Urdu Tweets for sentiment analysis-based hate speech detection. To improve the performance of the sentiment classifier, we employed dynamic stop words filtering, Variable Global Feature Selection Scheme (VGFSS) and Synthetic Minority Optimization Technique (SMOTE) to handle the sparsity, dimensionality and class imbalance problems respectively. We used two machine learning algorithms i.e., Support Vector Machines (SVM) and Multinomial Naïve Bayes' (MNB) for investigating performance in our experiments. Our results show that addressing class skew along with alleviating the high dimensionality problem brings about the maximum improvement in the overall performance of the sentiment analysis-based hate speech detection.
doi_str_mv	10.1109/ACCESS.2021.3087827
format	Article
fullrecord	<record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_proquest_journals_2541468182</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9449874</ieee_id><doaj_id>oai_doaj_org_article_dc350467f15c48aa9fd44156b7911797</doaj_id><sourcerecordid>2541468182</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-59f1c6acae520f074c9232dba613108f3fb43a3a3b483a24a52d479c077af6813</originalsourceid><addsrcrecordid>eNpNUVtrwjAULmODifMX-BLYc12uTfIozqkg7KH6HNI0cRVtXRI39u8XV5HlQE44fBdyviwbIzhBCMqX6Ww2L8sJhhhNCBRcYH6XDTAqZE4YKe7_vR-zUQh7mI5II8YH2WJ1PPnuq2l3YKmjBeXJWvMBXm20JjZdCzoHtr4-g823tTGAbbhAS9vG5pguMG314Sc04Sl7cPoQ7Ojah9n2bb6ZLfP1-2I1m65zQ6GIOZMOmUIbbRmGDnJqJCa4rnSBCILCEVdRolNVVBCNqWa4plwayLl2hUBkmK163brTe3XyzVH7H9XpRv0NOr9T2sfGHKyqDWGQFtwhZqjQWrqaUsSKikuEuORJ67nXShv4PNsQ1b47-_ShoDCjiCY_gROK9CjjuxC8dTdXBNUlANUHoC4BqGsAiTXuWY219saQlErBKfkFGmF_fw</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2541468182</pqid></control><display><type>article</type><title>Improving Hate Speech Detection of Urdu Tweets Using Sentiment Analysis</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Ali, Muhammad Z. ; Ehsan-Ul-Haq ; Rauf, Sahar ; Javed, Kashif ; Hussain, Sarmad</creator><creatorcontrib>Ali, Muhammad Z. ; Ehsan-Ul-Haq ; Rauf, Sahar ; Javed, Kashif ; Hussain, Sarmad</creatorcontrib><description>Sentiment Analysis is a technique that is being used abundantly nowadays for customer reviews analysis, popularity analysis of electoral candidates, hate speech detection and similar applications. Sentiment analysis on tweets encounters challenges such as highly skewed classes, high dimensional feature vectors and highly sparse data. In this study, we have analyzed the improvement achieved by successively addressing these problems in order to determine their severity for sentiment analysis of tweets. Firstly, we prepared a comprehensive data set consisting of Urdu Tweets for sentiment analysis-based hate speech detection. To improve the performance of the sentiment classifier, we employed dynamic stop words filtering, Variable Global Feature Selection Scheme (VGFSS) and Synthetic Minority Optimization Technique (SMOTE) to handle the sparsity, dimensionality and class imbalance problems respectively. We used two machine learning algorithms i.e., Support Vector Machines (SVM) and Multinomial Naïve Bayes' (MNB) for investigating performance in our experiments. Our results show that addressing class skew along with alleviating the high dimensionality problem brings about the maximum improvement in the overall performance of the sentiment analysis-based hate speech detection.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2021.3087827</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Blogs ; Data mining ; data sparsity ; Feature extraction ; Hate speech ; high-dimensional feature vector ; highly skewed classes ; Machine learning ; Machine learning algorithms ; Optimization ; Optimization techniques ; Performance enhancement ; Semantics ; Sentiment analysis ; Social networking (online) ; Support vector machines ; Vocabulary</subject><ispartof>IEEE access, 2021, Vol.9, p.84296-84305</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-59f1c6acae520f074c9232dba613108f3fb43a3a3b483a24a52d479c077af6813</citedby><cites>FETCH-LOGICAL-c408t-59f1c6acae520f074c9232dba613108f3fb43a3a3b483a24a52d479c077af6813</cites><orcidid>0000-0003-2558-1772</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9449874$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2095,4009,27612,27902,27903,27904,54912</link.rule.ids></links><search><creatorcontrib>Ali, Muhammad Z.</creatorcontrib><creatorcontrib>Ehsan-Ul-Haq</creatorcontrib><creatorcontrib>Rauf, Sahar</creatorcontrib><creatorcontrib>Javed, Kashif</creatorcontrib><creatorcontrib>Hussain, Sarmad</creatorcontrib><title>Improving Hate Speech Detection of Urdu Tweets Using Sentiment Analysis</title><title>IEEE access</title><addtitle>Access</addtitle><description>Sentiment Analysis is a technique that is being used abundantly nowadays for customer reviews analysis, popularity analysis of electoral candidates, hate speech detection and similar applications. Sentiment analysis on tweets encounters challenges such as highly skewed classes, high dimensional feature vectors and highly sparse data. In this study, we have analyzed the improvement achieved by successively addressing these problems in order to determine their severity for sentiment analysis of tweets. Firstly, we prepared a comprehensive data set consisting of Urdu Tweets for sentiment analysis-based hate speech detection. To improve the performance of the sentiment classifier, we employed dynamic stop words filtering, Variable Global Feature Selection Scheme (VGFSS) and Synthetic Minority Optimization Technique (SMOTE) to handle the sparsity, dimensionality and class imbalance problems respectively. We used two machine learning algorithms i.e., Support Vector Machines (SVM) and Multinomial Naïve Bayes' (MNB) for investigating performance in our experiments. Our results show that addressing class skew along with alleviating the high dimensionality problem brings about the maximum improvement in the overall performance of the sentiment analysis-based hate speech detection.</description><subject>Algorithms</subject><subject>Blogs</subject><subject>Data mining</subject><subject>data sparsity</subject><subject>Feature extraction</subject><subject>Hate speech</subject><subject>high-dimensional feature vector</subject><subject>highly skewed classes</subject><subject>Machine learning</subject><subject>Machine learning algorithms</subject><subject>Optimization</subject><subject>Optimization techniques</subject><subject>Performance enhancement</subject><subject>Semantics</subject><subject>Sentiment analysis</subject><subject>Social networking (online)</subject><subject>Support vector machines</subject><subject>Vocabulary</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUVtrwjAULmODifMX-BLYc12uTfIozqkg7KH6HNI0cRVtXRI39u8XV5HlQE44fBdyviwbIzhBCMqX6Ww2L8sJhhhNCBRcYH6XDTAqZE4YKe7_vR-zUQh7mI5II8YH2WJ1PPnuq2l3YKmjBeXJWvMBXm20JjZdCzoHtr4-g823tTGAbbhAS9vG5pguMG314Sc04Sl7cPoQ7Ojah9n2bb6ZLfP1-2I1m65zQ6GIOZMOmUIbbRmGDnJqJCa4rnSBCILCEVdRolNVVBCNqWa4plwayLl2hUBkmK163brTe3XyzVH7H9XpRv0NOr9T2sfGHKyqDWGQFtwhZqjQWrqaUsSKikuEuORJ67nXShv4PNsQ1b47-_ShoDCjiCY_gROK9CjjuxC8dTdXBNUlANUHoC4BqGsAiTXuWY219saQlErBKfkFGmF_fw</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Ali, Muhammad Z.</creator><creator>Ehsan-Ul-Haq</creator><creator>Rauf, Sahar</creator><creator>Javed, Kashif</creator><creator>Hussain, Sarmad</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-2558-1772</orcidid></search><sort><creationdate>2021</creationdate><title>Improving Hate Speech Detection of Urdu Tweets Using Sentiment Analysis</title><author>Ali, Muhammad Z. ; Ehsan-Ul-Haq ; Rauf, Sahar ; Javed, Kashif ; Hussain, Sarmad</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-59f1c6acae520f074c9232dba613108f3fb43a3a3b483a24a52d479c077af6813</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Blogs</topic><topic>Data mining</topic><topic>data sparsity</topic><topic>Feature extraction</topic><topic>Hate speech</topic><topic>high-dimensional feature vector</topic><topic>highly skewed classes</topic><topic>Machine learning</topic><topic>Machine learning algorithms</topic><topic>Optimization</topic><topic>Optimization techniques</topic><topic>Performance enhancement</topic><topic>Semantics</topic><topic>Sentiment analysis</topic><topic>Social networking (online)</topic><topic>Support vector machines</topic><topic>Vocabulary</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ali, Muhammad Z.</creatorcontrib><creatorcontrib>Ehsan-Ul-Haq</creatorcontrib><creatorcontrib>Rauf, Sahar</creatorcontrib><creatorcontrib>Javed, Kashif</creatorcontrib><creatorcontrib>Hussain, Sarmad</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ali, Muhammad Z.</au><au>Ehsan-Ul-Haq</au><au>Rauf, Sahar</au><au>Javed, Kashif</au><au>Hussain, Sarmad</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improving Hate Speech Detection of Urdu Tweets Using Sentiment Analysis</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2021</date><risdate>2021</risdate><volume>9</volume><spage>84296</spage><epage>84305</epage><pages>84296-84305</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Sentiment Analysis is a technique that is being used abundantly nowadays for customer reviews analysis, popularity analysis of electoral candidates, hate speech detection and similar applications. Sentiment analysis on tweets encounters challenges such as highly skewed classes, high dimensional feature vectors and highly sparse data. In this study, we have analyzed the improvement achieved by successively addressing these problems in order to determine their severity for sentiment analysis of tweets. Firstly, we prepared a comprehensive data set consisting of Urdu Tweets for sentiment analysis-based hate speech detection. To improve the performance of the sentiment classifier, we employed dynamic stop words filtering, Variable Global Feature Selection Scheme (VGFSS) and Synthetic Minority Optimization Technique (SMOTE) to handle the sparsity, dimensionality and class imbalance problems respectively. We used two machine learning algorithms i.e., Support Vector Machines (SVM) and Multinomial Naïve Bayes' (MNB) for investigating performance in our experiments. Our results show that addressing class skew along with alleviating the high dimensionality problem brings about the maximum improvement in the overall performance of the sentiment analysis-based hate speech detection.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2021.3087827</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0003-2558-1772</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2021, Vol.9, p.84296-84305
issn	2169-3536 2169-3536
language	eng
recordid	cdi_proquest_journals_2541468182
source	IEEE Open Access Journals; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals
subjects	Algorithms Blogs Data mining data sparsity Feature extraction Hate speech high-dimensional feature vector highly skewed classes Machine learning Machine learning algorithms Optimization Optimization techniques Performance enhancement Semantics Sentiment analysis Social networking (online) Support vector machines Vocabulary
title	Improving Hate Speech Detection of Urdu Tweets Using Sentiment Analysis
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T19%3A42%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improving%20Hate%20Speech%20Detection%20of%20Urdu%20Tweets%20Using%20Sentiment%20Analysis&rft.jtitle=IEEE%20access&rft.au=Ali,%20Muhammad%20Z.&rft.date=2021&rft.volume=9&rft.spage=84296&rft.epage=84305&rft.pages=84296-84305&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2021.3087827&rft_dat=%3Cproquest_doaj_%3E2541468182%3C/proquest_doaj_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2541468182&rft_id=info:pmid/&rft_ieee_id=9449874&rft_doaj_id=oai_doaj_org_article_dc350467f15c48aa9fd44156b7911797&rfr_iscdi=true