Sentiment Classification of Code-Switched Text using Pre-trained Multilingual Embeddings and Segmentation
With increasing globalization and immigration, various studies have estimated that about half of the world population is bilingual. Consequently, individuals concurrently use two or more languages or dialects in casual conversational settings. However, most research is natural language processing is...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Aryal, Saurav K Prioleau, Howard Washington, Gloria |
description | With increasing globalization and immigration, various studies have estimated
that about half of the world population is bilingual. Consequently, individuals
concurrently use two or more languages or dialects in casual conversational
settings. However, most research is natural language processing is focused on
monolingual text. To further the work in code-switched sentiment analysis, we
propose a multi-step natural language processing algorithm utilizing points of
code-switching in mixed text and conduct sentiment analysis around those
identified points. The proposed sentiment analysis algorithm uses semantic
similarity derived from large pre-trained multilingual models with a
handcrafted set of positive and negative words to determine the polarity of
code-switched text. The proposed approach outperforms a comparable baseline
model by 11.2% for accuracy and 11.64% for F1-score on a Spanish-English
dataset. Theoretically, the proposed algorithm can be expanded for sentiment
analysis of multiple languages with limited human expertise. |
doi_str_mv | 10.48550/arxiv.2210.16461 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2210_16461</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2210_16461</sourcerecordid><originalsourceid>FETCH-LOGICAL-a671-805f90c3621c07d30f62f9bd267805e360403d22acee0cd6407824eae0ac89b23</originalsourceid><addsrcrecordid>eNotj8tugzAQAH3poUr6AT3VP0BqbDBwjFD6kFK1EtzRYq_TlXhUYNr07-vQXHa1s9JIw9h9LHZJnqbiEaYzfe-kDCDWiY5vGVU4eOrD4GUH80yODHgaBz46Xo4Wo-qHvPlEy2s8e77MNJz4x4SRn4CGgN-WzlMX6AIdP_QtWhuOmcNgeYWni3oVbtmNg27Gu-vesPrpUJcv0fH9-bXcHyPQWRzlInWFMErL2IjMKuG0dEVrpc7CC5UWiVBWSjCIwlidiCyXCQIKMHnRSrVhD__atbX5mqiH6be5NDdrs_oDao5S1A</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Sentiment Classification of Code-Switched Text using Pre-trained Multilingual Embeddings and Segmentation</title><source>arXiv.org</source><creator>Aryal, Saurav K ; Prioleau, Howard ; Washington, Gloria</creator><creatorcontrib>Aryal, Saurav K ; Prioleau, Howard ; Washington, Gloria</creatorcontrib><description>With increasing globalization and immigration, various studies have estimated
that about half of the world population is bilingual. Consequently, individuals
concurrently use two or more languages or dialects in casual conversational
settings. However, most research is natural language processing is focused on
monolingual text. To further the work in code-switched sentiment analysis, we
propose a multi-step natural language processing algorithm utilizing points of
code-switching in mixed text and conduct sentiment analysis around those
identified points. The proposed sentiment analysis algorithm uses semantic
similarity derived from large pre-trained multilingual models with a
handcrafted set of positive and negative words to determine the polarity of
code-switched text. The proposed approach outperforms a comparable baseline
model by 11.2% for accuracy and 11.64% for F1-score on a Spanish-English
dataset. Theoretically, the proposed algorithm can be expanded for sentiment
analysis of multiple languages with limited human expertise.</description><identifier>DOI: 10.48550/arxiv.2210.16461</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language</subject><creationdate>2022-10</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2210.16461$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2210.16461$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Aryal, Saurav K</creatorcontrib><creatorcontrib>Prioleau, Howard</creatorcontrib><creatorcontrib>Washington, Gloria</creatorcontrib><title>Sentiment Classification of Code-Switched Text using Pre-trained Multilingual Embeddings and Segmentation</title><description>With increasing globalization and immigration, various studies have estimated
that about half of the world population is bilingual. Consequently, individuals
concurrently use two or more languages or dialects in casual conversational
settings. However, most research is natural language processing is focused on
monolingual text. To further the work in code-switched sentiment analysis, we
propose a multi-step natural language processing algorithm utilizing points of
code-switching in mixed text and conduct sentiment analysis around those
identified points. The proposed sentiment analysis algorithm uses semantic
similarity derived from large pre-trained multilingual models with a
handcrafted set of positive and negative words to determine the polarity of
code-switched text. The proposed approach outperforms a comparable baseline
model by 11.2% for accuracy and 11.64% for F1-score on a Spanish-English
dataset. Theoretically, the proposed algorithm can be expanded for sentiment
analysis of multiple languages with limited human expertise.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tugzAQAH3poUr6AT3VP0BqbDBwjFD6kFK1EtzRYq_TlXhUYNr07-vQXHa1s9JIw9h9LHZJnqbiEaYzfe-kDCDWiY5vGVU4eOrD4GUH80yODHgaBz46Xo4Wo-qHvPlEy2s8e77MNJz4x4SRn4CGgN-WzlMX6AIdP_QtWhuOmcNgeYWni3oVbtmNg27Gu-vesPrpUJcv0fH9-bXcHyPQWRzlInWFMErL2IjMKuG0dEVrpc7CC5UWiVBWSjCIwlidiCyXCQIKMHnRSrVhD__atbX5mqiH6be5NDdrs_oDao5S1A</recordid><startdate>20221028</startdate><enddate>20221028</enddate><creator>Aryal, Saurav K</creator><creator>Prioleau, Howard</creator><creator>Washington, Gloria</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20221028</creationdate><title>Sentiment Classification of Code-Switched Text using Pre-trained Multilingual Embeddings and Segmentation</title><author>Aryal, Saurav K ; Prioleau, Howard ; Washington, Gloria</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a671-805f90c3621c07d30f62f9bd267805e360403d22acee0cd6407824eae0ac89b23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Aryal, Saurav K</creatorcontrib><creatorcontrib>Prioleau, Howard</creatorcontrib><creatorcontrib>Washington, Gloria</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Aryal, Saurav K</au><au>Prioleau, Howard</au><au>Washington, Gloria</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Sentiment Classification of Code-Switched Text using Pre-trained Multilingual Embeddings and Segmentation</atitle><date>2022-10-28</date><risdate>2022</risdate><abstract>With increasing globalization and immigration, various studies have estimated
that about half of the world population is bilingual. Consequently, individuals
concurrently use two or more languages or dialects in casual conversational
settings. However, most research is natural language processing is focused on
monolingual text. To further the work in code-switched sentiment analysis, we
propose a multi-step natural language processing algorithm utilizing points of
code-switching in mixed text and conduct sentiment analysis around those
identified points. The proposed sentiment analysis algorithm uses semantic
similarity derived from large pre-trained multilingual models with a
handcrafted set of positive and negative words to determine the polarity of
code-switched text. The proposed approach outperforms a comparable baseline
model by 11.2% for accuracy and 11.64% for F1-score on a Spanish-English
dataset. Theoretically, the proposed algorithm can be expanded for sentiment
analysis of multiple languages with limited human expertise.</abstract><doi>10.48550/arxiv.2210.16461</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2210.16461 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2210_16461 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Computation and Language |
title | Sentiment Classification of Code-Switched Text using Pre-trained Multilingual Embeddings and Segmentation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T07%3A54%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Sentiment%20Classification%20of%20Code-Switched%20Text%20using%20Pre-trained%20Multilingual%20Embeddings%20and%20Segmentation&rft.au=Aryal,%20Saurav%20K&rft.date=2022-10-28&rft_id=info:doi/10.48550/arxiv.2210.16461&rft_dat=%3Carxiv_GOX%3E2210_16461%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |