Improving Hate Speech Detection of Urdu Tweets Using Sentiment Analysis
Sentiment Analysis is a technique that is being used abundantly nowadays for customer reviews analysis, popularity analysis of electoral candidates, hate speech detection and similar applications. Sentiment analysis on tweets encounters challenges such as highly skewed classes, high dimensional feat...
Gespeichert in:
Veröffentlicht in: | IEEE access 2021, Vol.9, p.84296-84305 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 84305 |
---|---|
container_issue | |
container_start_page | 84296 |
container_title | IEEE access |
container_volume | 9 |
creator | Ali, Muhammad Z. Ehsan-Ul-Haq Rauf, Sahar Javed, Kashif Hussain, Sarmad |
description | Sentiment Analysis is a technique that is being used abundantly nowadays for customer reviews analysis, popularity analysis of electoral candidates, hate speech detection and similar applications. Sentiment analysis on tweets encounters challenges such as highly skewed classes, high dimensional feature vectors and highly sparse data. In this study, we have analyzed the improvement achieved by successively addressing these problems in order to determine their severity for sentiment analysis of tweets. Firstly, we prepared a comprehensive data set consisting of Urdu Tweets for sentiment analysis-based hate speech detection. To improve the performance of the sentiment classifier, we employed dynamic stop words filtering, Variable Global Feature Selection Scheme (VGFSS) and Synthetic Minority Optimization Technique (SMOTE) to handle the sparsity, dimensionality and class imbalance problems respectively. We used two machine learning algorithms i.e., Support Vector Machines (SVM) and Multinomial Naïve Bayes' (MNB) for investigating performance in our experiments. Our results show that addressing class skew along with alleviating the high dimensionality problem brings about the maximum improvement in the overall performance of the sentiment analysis-based hate speech detection. |
doi_str_mv | 10.1109/ACCESS.2021.3087827 |
format | Article |
fullrecord | <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_proquest_journals_2541468182</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9449874</ieee_id><doaj_id>oai_doaj_org_article_dc350467f15c48aa9fd44156b7911797</doaj_id><sourcerecordid>2541468182</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-59f1c6acae520f074c9232dba613108f3fb43a3a3b483a24a52d479c077af6813</originalsourceid><addsrcrecordid>eNpNUVtrwjAULmODifMX-BLYc12uTfIozqkg7KH6HNI0cRVtXRI39u8XV5HlQE44fBdyviwbIzhBCMqX6Ww2L8sJhhhNCBRcYH6XDTAqZE4YKe7_vR-zUQh7mI5II8YH2WJ1PPnuq2l3YKmjBeXJWvMBXm20JjZdCzoHtr4-g823tTGAbbhAS9vG5pguMG314Sc04Sl7cPoQ7Ojah9n2bb6ZLfP1-2I1m65zQ6GIOZMOmUIbbRmGDnJqJCa4rnSBCILCEVdRolNVVBCNqWa4plwayLl2hUBkmK163brTe3XyzVH7H9XpRv0NOr9T2sfGHKyqDWGQFtwhZqjQWrqaUsSKikuEuORJ67nXShv4PNsQ1b47-_ShoDCjiCY_gROK9CjjuxC8dTdXBNUlANUHoC4BqGsAiTXuWY219saQlErBKfkFGmF_fw</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2541468182</pqid></control><display><type>article</type><title>Improving Hate Speech Detection of Urdu Tweets Using Sentiment Analysis</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Ali, Muhammad Z. ; Ehsan-Ul-Haq ; Rauf, Sahar ; Javed, Kashif ; Hussain, Sarmad</creator><creatorcontrib>Ali, Muhammad Z. ; Ehsan-Ul-Haq ; Rauf, Sahar ; Javed, Kashif ; Hussain, Sarmad</creatorcontrib><description>Sentiment Analysis is a technique that is being used abundantly nowadays for customer reviews analysis, popularity analysis of electoral candidates, hate speech detection and similar applications. Sentiment analysis on tweets encounters challenges such as highly skewed classes, high dimensional feature vectors and highly sparse data. In this study, we have analyzed the improvement achieved by successively addressing these problems in order to determine their severity for sentiment analysis of tweets. Firstly, we prepared a comprehensive data set consisting of Urdu Tweets for sentiment analysis-based hate speech detection. To improve the performance of the sentiment classifier, we employed dynamic stop words filtering, Variable Global Feature Selection Scheme (VGFSS) and Synthetic Minority Optimization Technique (SMOTE) to handle the sparsity, dimensionality and class imbalance problems respectively. We used two machine learning algorithms i.e., Support Vector Machines (SVM) and Multinomial Naïve Bayes' (MNB) for investigating performance in our experiments. Our results show that addressing class skew along with alleviating the high dimensionality problem brings about the maximum improvement in the overall performance of the sentiment analysis-based hate speech detection.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2021.3087827</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Blogs ; Data mining ; data sparsity ; Feature extraction ; Hate speech ; high-dimensional feature vector ; highly skewed classes ; Machine learning ; Machine learning algorithms ; Optimization ; Optimization techniques ; Performance enhancement ; Semantics ; Sentiment analysis ; Social networking (online) ; Support vector machines ; Vocabulary</subject><ispartof>IEEE access, 2021, Vol.9, p.84296-84305</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-59f1c6acae520f074c9232dba613108f3fb43a3a3b483a24a52d479c077af6813</citedby><cites>FETCH-LOGICAL-c408t-59f1c6acae520f074c9232dba613108f3fb43a3a3b483a24a52d479c077af6813</cites><orcidid>0000-0003-2558-1772</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9449874$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,860,2095,4009,27612,27902,27903,27904,54912</link.rule.ids></links><search><creatorcontrib>Ali, Muhammad Z.</creatorcontrib><creatorcontrib>Ehsan-Ul-Haq</creatorcontrib><creatorcontrib>Rauf, Sahar</creatorcontrib><creatorcontrib>Javed, Kashif</creatorcontrib><creatorcontrib>Hussain, Sarmad</creatorcontrib><title>Improving Hate Speech Detection of Urdu Tweets Using Sentiment Analysis</title><title>IEEE access</title><addtitle>Access</addtitle><description>Sentiment Analysis is a technique that is being used abundantly nowadays for customer reviews analysis, popularity analysis of electoral candidates, hate speech detection and similar applications. Sentiment analysis on tweets encounters challenges such as highly skewed classes, high dimensional feature vectors and highly sparse data. In this study, we have analyzed the improvement achieved by successively addressing these problems in order to determine their severity for sentiment analysis of tweets. Firstly, we prepared a comprehensive data set consisting of Urdu Tweets for sentiment analysis-based hate speech detection. To improve the performance of the sentiment classifier, we employed dynamic stop words filtering, Variable Global Feature Selection Scheme (VGFSS) and Synthetic Minority Optimization Technique (SMOTE) to handle the sparsity, dimensionality and class imbalance problems respectively. We used two machine learning algorithms i.e., Support Vector Machines (SVM) and Multinomial Naïve Bayes' (MNB) for investigating performance in our experiments. Our results show that addressing class skew along with alleviating the high dimensionality problem brings about the maximum improvement in the overall performance of the sentiment analysis-based hate speech detection.</description><subject>Algorithms</subject><subject>Blogs</subject><subject>Data mining</subject><subject>data sparsity</subject><subject>Feature extraction</subject><subject>Hate speech</subject><subject>high-dimensional feature vector</subject><subject>highly skewed classes</subject><subject>Machine learning</subject><subject>Machine learning algorithms</subject><subject>Optimization</subject><subject>Optimization techniques</subject><subject>Performance enhancement</subject><subject>Semantics</subject><subject>Sentiment analysis</subject><subject>Social networking (online)</subject><subject>Support vector machines</subject><subject>Vocabulary</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUVtrwjAULmODifMX-BLYc12uTfIozqkg7KH6HNI0cRVtXRI39u8XV5HlQE44fBdyviwbIzhBCMqX6Ww2L8sJhhhNCBRcYH6XDTAqZE4YKe7_vR-zUQh7mI5II8YH2WJ1PPnuq2l3YKmjBeXJWvMBXm20JjZdCzoHtr4-g823tTGAbbhAS9vG5pguMG314Sc04Sl7cPoQ7Ojah9n2bb6ZLfP1-2I1m65zQ6GIOZMOmUIbbRmGDnJqJCa4rnSBCILCEVdRolNVVBCNqWa4plwayLl2hUBkmK163brTe3XyzVH7H9XpRv0NOr9T2sfGHKyqDWGQFtwhZqjQWrqaUsSKikuEuORJ67nXShv4PNsQ1b47-_ShoDCjiCY_gROK9CjjuxC8dTdXBNUlANUHoC4BqGsAiTXuWY219saQlErBKfkFGmF_fw</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Ali, Muhammad Z.</creator><creator>Ehsan-Ul-Haq</creator><creator>Rauf, Sahar</creator><creator>Javed, Kashif</creator><creator>Hussain, Sarmad</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-2558-1772</orcidid></search><sort><creationdate>2021</creationdate><title>Improving Hate Speech Detection of Urdu Tweets Using Sentiment Analysis</title><author>Ali, Muhammad Z. ; Ehsan-Ul-Haq ; Rauf, Sahar ; Javed, Kashif ; Hussain, Sarmad</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-59f1c6acae520f074c9232dba613108f3fb43a3a3b483a24a52d479c077af6813</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Blogs</topic><topic>Data mining</topic><topic>data sparsity</topic><topic>Feature extraction</topic><topic>Hate speech</topic><topic>high-dimensional feature vector</topic><topic>highly skewed classes</topic><topic>Machine learning</topic><topic>Machine learning algorithms</topic><topic>Optimization</topic><topic>Optimization techniques</topic><topic>Performance enhancement</topic><topic>Semantics</topic><topic>Sentiment analysis</topic><topic>Social networking (online)</topic><topic>Support vector machines</topic><topic>Vocabulary</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ali, Muhammad Z.</creatorcontrib><creatorcontrib>Ehsan-Ul-Haq</creatorcontrib><creatorcontrib>Rauf, Sahar</creatorcontrib><creatorcontrib>Javed, Kashif</creatorcontrib><creatorcontrib>Hussain, Sarmad</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ali, Muhammad Z.</au><au>Ehsan-Ul-Haq</au><au>Rauf, Sahar</au><au>Javed, Kashif</au><au>Hussain, Sarmad</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improving Hate Speech Detection of Urdu Tweets Using Sentiment Analysis</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2021</date><risdate>2021</risdate><volume>9</volume><spage>84296</spage><epage>84305</epage><pages>84296-84305</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Sentiment Analysis is a technique that is being used abundantly nowadays for customer reviews analysis, popularity analysis of electoral candidates, hate speech detection and similar applications. Sentiment analysis on tweets encounters challenges such as highly skewed classes, high dimensional feature vectors and highly sparse data. In this study, we have analyzed the improvement achieved by successively addressing these problems in order to determine their severity for sentiment analysis of tweets. Firstly, we prepared a comprehensive data set consisting of Urdu Tweets for sentiment analysis-based hate speech detection. To improve the performance of the sentiment classifier, we employed dynamic stop words filtering, Variable Global Feature Selection Scheme (VGFSS) and Synthetic Minority Optimization Technique (SMOTE) to handle the sparsity, dimensionality and class imbalance problems respectively. We used two machine learning algorithms i.e., Support Vector Machines (SVM) and Multinomial Naïve Bayes' (MNB) for investigating performance in our experiments. Our results show that addressing class skew along with alleviating the high dimensionality problem brings about the maximum improvement in the overall performance of the sentiment analysis-based hate speech detection.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2021.3087827</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0003-2558-1772</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2021, Vol.9, p.84296-84305 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_proquest_journals_2541468182 |
source | IEEE Open Access Journals; DOAJ Directory of Open Access Journals; EZB-FREE-00999 freely available EZB journals |
subjects | Algorithms Blogs Data mining data sparsity Feature extraction Hate speech high-dimensional feature vector highly skewed classes Machine learning Machine learning algorithms Optimization Optimization techniques Performance enhancement Semantics Sentiment analysis Social networking (online) Support vector machines Vocabulary |
title | Improving Hate Speech Detection of Urdu Tweets Using Sentiment Analysis |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T19%3A42%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improving%20Hate%20Speech%20Detection%20of%20Urdu%20Tweets%20Using%20Sentiment%20Analysis&rft.jtitle=IEEE%20access&rft.au=Ali,%20Muhammad%20Z.&rft.date=2021&rft.volume=9&rft.spage=84296&rft.epage=84305&rft.pages=84296-84305&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2021.3087827&rft_dat=%3Cproquest_doaj_%3E2541468182%3C/proquest_doaj_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2541468182&rft_id=info:pmid/&rft_ieee_id=9449874&rft_doaj_id=oai_doaj_org_article_dc350467f15c48aa9fd44156b7911797&rfr_iscdi=true |