Sentiment Classification of Code-Switched Text using Pre-trained Multilingual Embeddings and Segmentation

With increasing globalization and immigration, various studies have estimated that about half of the world population is bilingual. Consequently, individuals concurrently use two or more languages or dialects in casual conversational settings. However, most research is natural language processing is...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Aryal, Saurav K, Prioleau, Howard, Washington, Gloria
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Aryal, Saurav K
Prioleau, Howard
Washington, Gloria
description With increasing globalization and immigration, various studies have estimated that about half of the world population is bilingual. Consequently, individuals concurrently use two or more languages or dialects in casual conversational settings. However, most research is natural language processing is focused on monolingual text. To further the work in code-switched sentiment analysis, we propose a multi-step natural language processing algorithm utilizing points of code-switching in mixed text and conduct sentiment analysis around those identified points. The proposed sentiment analysis algorithm uses semantic similarity derived from large pre-trained multilingual models with a handcrafted set of positive and negative words to determine the polarity of code-switched text. The proposed approach outperforms a comparable baseline model by 11.2% for accuracy and 11.64% for F1-score on a Spanish-English dataset. Theoretically, the proposed algorithm can be expanded for sentiment analysis of multiple languages with limited human expertise.
doi_str_mv 10.48550/arxiv.2210.16461
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2210_16461</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2210_16461</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-805f90c3621c07d30f62f9bd267805e360403d22acee0cd6407824eae0ac89b23</originalsourceid><addsrcrecordid>eNotj8tugzAQAH3poUr6AT3VP0BqbDBwjFD6kFK1EtzRYq_TlXhUYNr07-vQXHa1s9JIw9h9LHZJnqbiEaYzfe-kDCDWiY5vGVU4eOrD4GUH80yODHgaBz46Xo4Wo-qHvPlEy2s8e77MNJz4x4SRn4CGgN-WzlMX6AIdP_QtWhuOmcNgeYWni3oVbtmNg27Gu-vesPrpUJcv0fH9-bXcHyPQWRzlInWFMErL2IjMKuG0dEVrpc7CC5UWiVBWSjCIwlidiCyXCQIKMHnRSrVhD__atbX5mqiH6be5NDdrs_oDao5S1A</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Sentiment Classification of Code-Switched Text using Pre-trained Multilingual Embeddings and Segmentation</title><source>arXiv.org</source><creator>Aryal, Saurav K ; Prioleau, Howard ; Washington, Gloria</creator><creatorcontrib>Aryal, Saurav K ; Prioleau, Howard ; Washington, Gloria</creatorcontrib><description>With increasing globalization and immigration, various studies have estimated that about half of the world population is bilingual. Consequently, individuals concurrently use two or more languages or dialects in casual conversational settings. However, most research is natural language processing is focused on monolingual text. To further the work in code-switched sentiment analysis, we propose a multi-step natural language processing algorithm utilizing points of code-switching in mixed text and conduct sentiment analysis around those identified points. The proposed sentiment analysis algorithm uses semantic similarity derived from large pre-trained multilingual models with a handcrafted set of positive and negative words to determine the polarity of code-switched text. The proposed approach outperforms a comparable baseline model by 11.2% for accuracy and 11.64% for F1-score on a Spanish-English dataset. Theoretically, the proposed algorithm can be expanded for sentiment analysis of multiple languages with limited human expertise.</description><identifier>DOI: 10.48550/arxiv.2210.16461</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language</subject><creationdate>2022-10</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2210.16461$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2210.16461$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Aryal, Saurav K</creatorcontrib><creatorcontrib>Prioleau, Howard</creatorcontrib><creatorcontrib>Washington, Gloria</creatorcontrib><title>Sentiment Classification of Code-Switched Text using Pre-trained Multilingual Embeddings and Segmentation</title><description>With increasing globalization and immigration, various studies have estimated that about half of the world population is bilingual. Consequently, individuals concurrently use two or more languages or dialects in casual conversational settings. However, most research is natural language processing is focused on monolingual text. To further the work in code-switched sentiment analysis, we propose a multi-step natural language processing algorithm utilizing points of code-switching in mixed text and conduct sentiment analysis around those identified points. The proposed sentiment analysis algorithm uses semantic similarity derived from large pre-trained multilingual models with a handcrafted set of positive and negative words to determine the polarity of code-switched text. The proposed approach outperforms a comparable baseline model by 11.2% for accuracy and 11.64% for F1-score on a Spanish-English dataset. Theoretically, the proposed algorithm can be expanded for sentiment analysis of multiple languages with limited human expertise.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tugzAQAH3poUr6AT3VP0BqbDBwjFD6kFK1EtzRYq_TlXhUYNr07-vQXHa1s9JIw9h9LHZJnqbiEaYzfe-kDCDWiY5vGVU4eOrD4GUH80yODHgaBz46Xo4Wo-qHvPlEy2s8e77MNJz4x4SRn4CGgN-WzlMX6AIdP_QtWhuOmcNgeYWni3oVbtmNg27Gu-vesPrpUJcv0fH9-bXcHyPQWRzlInWFMErL2IjMKuG0dEVrpc7CC5UWiVBWSjCIwlidiCyXCQIKMHnRSrVhD__atbX5mqiH6be5NDdrs_oDao5S1A</recordid><startdate>20221028</startdate><enddate>20221028</enddate><creator>Aryal, Saurav K</creator><creator>Prioleau, Howard</creator><creator>Washington, Gloria</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20221028</creationdate><title>Sentiment Classification of Code-Switched Text using Pre-trained Multilingual Embeddings and Segmentation</title><author>Aryal, Saurav K ; Prioleau, Howard ; Washington, Gloria</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-805f90c3621c07d30f62f9bd267805e360403d22acee0cd6407824eae0ac89b23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Aryal, Saurav K</creatorcontrib><creatorcontrib>Prioleau, Howard</creatorcontrib><creatorcontrib>Washington, Gloria</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Aryal, Saurav K</au><au>Prioleau, Howard</au><au>Washington, Gloria</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sentiment Classification of Code-Switched Text using Pre-trained Multilingual Embeddings and Segmentation</atitle><date>2022-10-28</date><risdate>2022</risdate><abstract>With increasing globalization and immigration, various studies have estimated that about half of the world population is bilingual. Consequently, individuals concurrently use two or more languages or dialects in casual conversational settings. However, most research is natural language processing is focused on monolingual text. To further the work in code-switched sentiment analysis, we propose a multi-step natural language processing algorithm utilizing points of code-switching in mixed text and conduct sentiment analysis around those identified points. The proposed sentiment analysis algorithm uses semantic similarity derived from large pre-trained multilingual models with a handcrafted set of positive and negative words to determine the polarity of code-switched text. The proposed approach outperforms a comparable baseline model by 11.2% for accuracy and 11.64% for F1-score on a Spanish-English dataset. Theoretically, the proposed algorithm can be expanded for sentiment analysis of multiple languages with limited human expertise.</abstract><doi>10.48550/arxiv.2210.16461</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2210.16461
ispartof
issn
language eng
recordid cdi_arxiv_primary_2210_16461
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Computation and Language
title Sentiment Classification of Code-Switched Text using Pre-trained Multilingual Embeddings and Segmentation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T07%3A54%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sentiment%20Classification%20of%20Code-Switched%20Text%20using%20Pre-trained%20Multilingual%20Embeddings%20and%20Segmentation&rft.au=Aryal,%20Saurav%20K&rft.date=2022-10-28&rft_id=info:doi/10.48550/arxiv.2210.16461&rft_dat=%3Carxiv_GOX%3E2210_16461%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true