Sentiment Classification of Code-Switched Text using Pre-trained Multilingual Embeddings and Segmentation

With increasing globalization and immigration, various studies have estimated that about half of the world population is bilingual. Consequently, individuals concurrently use two or more languages or dialects in casual conversational settings. However, most research is natural language processing is...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Aryal, Saurav K, Prioleau, Howard, Washington, Gloria
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computation and Language
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Aryal, Saurav K Prioleau, Howard Washington, Gloria
description	With increasing globalization and immigration, various studies have estimated that about half of the world population is bilingual. Consequently, individuals concurrently use two or more languages or dialects in casual conversational settings. However, most research is natural language processing is focused on monolingual text. To further the work in code-switched sentiment analysis, we propose a multi-step natural language processing algorithm utilizing points of code-switching in mixed text and conduct sentiment analysis around those identified points. The proposed sentiment analysis algorithm uses semantic similarity derived from large pre-trained multilingual models with a handcrafted set of positive and negative words to determine the polarity of code-switched text. The proposed approach outperforms a comparable baseline model by 11.2% for accuracy and 11.64% for F1-score on a Spanish-English dataset. Theoretically, the proposed algorithm can be expanded for sentiment analysis of multiple languages with limited human expertise.
doi_str_mv	10.48550/arxiv.2210.16461
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2210_16461</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2210_16461</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-805f90c3621c07d30f62f9bd267805e360403d22acee0cd6407824eae0ac89b23</originalsourceid><addsrcrecordid>eNotj8tugzAQAH3poUr6AT3VP0BqbDBwjFD6kFK1EtzRYq_TlXhUYNr07-vQXHa1s9JIw9h9LHZJnqbiEaYzfe-kDCDWiY5vGVU4eOrD4GUH80yODHgaBz46Xo4Wo-qHvPlEy2s8e77MNJz4x4SRn4CGgN-WzlMX6AIdP_QtWhuOmcNgeYWni3oVbtmNg27Gu-vesPrpUJcv0fH9-bXcHyPQWRzlInWFMErL2IjMKuG0dEVrpc7CC5UWiVBWSjCIwlidiCyXCQIKMHnRSrVhD__atbX5mqiH6be5NDdrs_oDao5S1A</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Sentiment Classification of Code-Switched Text using Pre-trained Multilingual Embeddings and Segmentation</title><source>arXiv.org</source><creator>Aryal, Saurav K ; Prioleau, Howard ; Washington, Gloria</creator><creatorcontrib>Aryal, Saurav K ; Prioleau, Howard ; Washington, Gloria</creatorcontrib><description>With increasing globalization and immigration, various studies have estimated that about half of the world population is bilingual. Consequently, individuals concurrently use two or more languages or dialects in casual conversational settings. However, most research is natural language processing is focused on monolingual text. To further the work in code-switched sentiment analysis, we propose a multi-step natural language processing algorithm utilizing points of code-switching in mixed text and conduct sentiment analysis around those identified points. The proposed sentiment analysis algorithm uses semantic similarity derived from large pre-trained multilingual models with a handcrafted set of positive and negative words to determine the polarity of code-switched text. The proposed approach outperforms a comparable baseline model by 11.2% for accuracy and 11.64% for F1-score on a Spanish-English dataset. Theoretically, the proposed algorithm can be expanded for sentiment analysis of multiple languages with limited human expertise.</description><identifier>DOI: 10.48550/arxiv.2210.16461</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language</subject><creationdate>2022-10</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2210.16461$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2210.16461$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Aryal, Saurav K</creatorcontrib><creatorcontrib>Prioleau, Howard</creatorcontrib><creatorcontrib>Washington, Gloria</creatorcontrib><title>Sentiment Classification of Code-Switched Text using Pre-trained Multilingual Embeddings and Segmentation</title><description>With increasing globalization and immigration, various studies have estimated that about half of the world population is bilingual. Consequently, individuals concurrently use two or more languages or dialects in casual conversational settings. However, most research is natural language processing is focused on monolingual text. To further the work in code-switched sentiment analysis, we propose a multi-step natural language processing algorithm utilizing points of code-switching in mixed text and conduct sentiment analysis around those identified points. The proposed sentiment analysis algorithm uses semantic similarity derived from large pre-trained multilingual models with a handcrafted set of positive and negative words to determine the polarity of code-switched text. The proposed approach outperforms a comparable baseline model by 11.2% for accuracy and 11.64% for F1-score on a Spanish-English dataset. Theoretically, the proposed algorithm can be expanded for sentiment analysis of multiple languages with limited human expertise.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tugzAQAH3poUr6AT3VP0BqbDBwjFD6kFK1EtzRYq_TlXhUYNr07-vQXHa1s9JIw9h9LHZJnqbiEaYzfe-kDCDWiY5vGVU4eOrD4GUH80yODHgaBz46Xo4Wo-qHvPlEy2s8e77MNJz4x4SRn4CGgN-WzlMX6AIdP_QtWhuOmcNgeYWni3oVbtmNg27Gu-vesPrpUJcv0fH9-bXcHyPQWRzlInWFMErL2IjMKuG0dEVrpc7CC5UWiVBWSjCIwlidiCyXCQIKMHnRSrVhD__atbX5mqiH6be5NDdrs_oDao5S1A</recordid><startdate>20221028</startdate><enddate>20221028</enddate><creator>Aryal, Saurav K</creator><creator>Prioleau, Howard</creator><creator>Washington, Gloria</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20221028</creationdate><title>Sentiment Classification of Code-Switched Text using Pre-trained Multilingual Embeddings and Segmentation</title><author>Aryal, Saurav K ; Prioleau, Howard ; Washington, Gloria</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-805f90c3621c07d30f62f9bd267805e360403d22acee0cd6407824eae0ac89b23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Aryal, Saurav K</creatorcontrib><creatorcontrib>Prioleau, Howard</creatorcontrib><creatorcontrib>Washington, Gloria</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Aryal, Saurav K</au><au>Prioleau, Howard</au><au>Washington, Gloria</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sentiment Classification of Code-Switched Text using Pre-trained Multilingual Embeddings and Segmentation</atitle><date>2022-10-28</date><risdate>2022</risdate><abstract>With increasing globalization and immigration, various studies have estimated that about half of the world population is bilingual. Consequently, individuals concurrently use two or more languages or dialects in casual conversational settings. However, most research is natural language processing is focused on monolingual text. To further the work in code-switched sentiment analysis, we propose a multi-step natural language processing algorithm utilizing points of code-switching in mixed text and conduct sentiment analysis around those identified points. The proposed sentiment analysis algorithm uses semantic similarity derived from large pre-trained multilingual models with a handcrafted set of positive and negative words to determine the polarity of code-switched text. The proposed approach outperforms a comparable baseline model by 11.2% for accuracy and 11.64% for F1-score on a Spanish-English dataset. Theoretically, the proposed algorithm can be expanded for sentiment analysis of multiple languages with limited human expertise.</abstract><doi>10.48550/arxiv.2210.16461</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2210.16461
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2210_16461
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language
title	Sentiment Classification of Code-Switched Text using Pre-trained Multilingual Embeddings and Segmentation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T07%3A54%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sentiment%20Classification%20of%20Code-Switched%20Text%20using%20Pre-trained%20Multilingual%20Embeddings%20and%20Segmentation&rft.au=Aryal,%20Saurav%20K&rft.date=2022-10-28&rft_id=info:doi/10.48550/arxiv.2210.16461&rft_dat=%3Carxiv_GOX%3E2210_16461%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true