Code‐mixed Hindi‐English text correction using fuzzy graph and word embedding

Interaction via social media involves frequent code‐mixed text, spelling errors and noisy elements, which creates a bottleneck in the performance of natural language processing applications. This proposed work is the first approach for code‐mixed Hindi‐English social media text that comprises langua...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems 2024-07, Vol.41 (7), p.n/a
Hauptverfasser: Jain, Minni, Jindal, Rajni, Jain, Amita
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page n/a
container_issue 7
container_start_page
container_title Expert systems
container_volume 41
creator Jain, Minni
Jindal, Rajni
Jain, Amita
description Interaction via social media involves frequent code‐mixed text, spelling errors and noisy elements, which creates a bottleneck in the performance of natural language processing applications. This proposed work is the first approach for code‐mixed Hindi‐English social media text that comprises language identification, detection and correction of non‐word (Out of Vocabulary) errors as well as real‐word errors occurring simultaneously. Each identified language (Devanagari Hindi, Roman Hindi, and English) has its own complexities and challenges. Errors are detected individually for each language and a suggestive list of the erroneous words is created. After this, a fuzzy graph between different words of the suggestive lists is generated using various semantic relations in Hindi WordNet. Word embeddings and Fuzzy graph‐based centrality measures are used to find the correct word. Several experiments are performed on different social media datasets taken from Instagram, Twitter, YouTube comments, Blogs, and WhatsApp. The experimental results demonstrate that the proposed system corrects out‐of‐vocabulary words as well as real‐word errors with a maximum recall of 0.90 and 0.67, respectively for Dev_Hindi and 0.87 and 0.66, respectively for Rom_Hindi. The proposed method is also applied for state‐of‐art sentiment analysis approaches where the F1‐score has been visibly improved.
doi_str_mv 10.1111/exsy.13328
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3063859678</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3063859678</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3018-a3899a1b3540f181211c55146bbd5cc3016706d7eac2cde8b32131f11bccad533</originalsourceid><addsrcrecordid>eNp9kN1KwzAUgIMoOKc3PkHAO6Ezp2nT9FLGdMJARAW9CmmSbhlbM5OWrbvyEXxGn8TOeu25ORzOd374ELoEMoIubswutCOgNOZHaAAJ4xGheXKMBiRmLEqymJyisxCWhBDIMjZAT2Onzffn19rujMZTW2nbVZNqvrJhgWuzq7Fy3htVW1fhJthqjstmv2_x3MvNAstK463zGpt1YbTu2ufopJSrYC7-8hC93k1extNo9nj_ML6dRYoS4JGkPM8lFDRNSAkcYgCVpt3LRaFTdWBYRpjOjFSx0oYXNAYKJUChlNQppUN01e_dePfRmFCLpWt81Z0UlDDK05xlvKOue0p5F4I3pdh4u5a-FUDEQZk4KBO_yjoYenhrV6b9hxSTt-f3fuYHFLZwjA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3063859678</pqid></control><display><type>article</type><title>Code‐mixed Hindi‐English text correction using fuzzy graph and word embedding</title><source>Wiley Online Library Journals Frontfile Complete</source><creator>Jain, Minni ; Jindal, Rajni ; Jain, Amita</creator><creatorcontrib>Jain, Minni ; Jindal, Rajni ; Jain, Amita</creatorcontrib><description>Interaction via social media involves frequent code‐mixed text, spelling errors and noisy elements, which creates a bottleneck in the performance of natural language processing applications. This proposed work is the first approach for code‐mixed Hindi‐English social media text that comprises language identification, detection and correction of non‐word (Out of Vocabulary) errors as well as real‐word errors occurring simultaneously. Each identified language (Devanagari Hindi, Roman Hindi, and English) has its own complexities and challenges. Errors are detected individually for each language and a suggestive list of the erroneous words is created. After this, a fuzzy graph between different words of the suggestive lists is generated using various semantic relations in Hindi WordNet. Word embeddings and Fuzzy graph‐based centrality measures are used to find the correct word. Several experiments are performed on different social media datasets taken from Instagram, Twitter, YouTube comments, Blogs, and WhatsApp. The experimental results demonstrate that the proposed system corrects out‐of‐vocabulary words as well as real‐word errors with a maximum recall of 0.90 and 0.67, respectively for Dev_Hindi and 0.87 and 0.66, respectively for Rom_Hindi. The proposed method is also applied for state‐of‐art sentiment analysis approaches where the F1‐score has been visibly improved.</description><identifier>ISSN: 0266-4720</identifier><identifier>EISSN: 1468-0394</identifier><identifier>DOI: 10.1111/exsy.13328</identifier><language>eng</language><publisher>Oxford: Blackwell Publishing Ltd</publisher><subject>Data mining ; Digital media ; Errors ; fuzzy centrality measures ; fuzzy graphs ; Hindi WordNet ; Natural language processing ; real‐word error and non‐word error ; Semantics ; Sentiment analysis ; Social networks ; text normalization ; Word2Vec ; Words (language)</subject><ispartof>Expert systems, 2024-07, Vol.41 (7), p.n/a</ispartof><rights>2023 John Wiley &amp; Sons Ltd.</rights><rights>2024 John Wiley &amp; Sons, Ltd.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3018-a3899a1b3540f181211c55146bbd5cc3016706d7eac2cde8b32131f11bccad533</citedby><cites>FETCH-LOGICAL-c3018-a3899a1b3540f181211c55146bbd5cc3016706d7eac2cde8b32131f11bccad533</cites><orcidid>0000-0003-0891-3675</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1111%2Fexsy.13328$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1111%2Fexsy.13328$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids></links><search><creatorcontrib>Jain, Minni</creatorcontrib><creatorcontrib>Jindal, Rajni</creatorcontrib><creatorcontrib>Jain, Amita</creatorcontrib><title>Code‐mixed Hindi‐English text correction using fuzzy graph and word embedding</title><title>Expert systems</title><description>Interaction via social media involves frequent code‐mixed text, spelling errors and noisy elements, which creates a bottleneck in the performance of natural language processing applications. This proposed work is the first approach for code‐mixed Hindi‐English social media text that comprises language identification, detection and correction of non‐word (Out of Vocabulary) errors as well as real‐word errors occurring simultaneously. Each identified language (Devanagari Hindi, Roman Hindi, and English) has its own complexities and challenges. Errors are detected individually for each language and a suggestive list of the erroneous words is created. After this, a fuzzy graph between different words of the suggestive lists is generated using various semantic relations in Hindi WordNet. Word embeddings and Fuzzy graph‐based centrality measures are used to find the correct word. Several experiments are performed on different social media datasets taken from Instagram, Twitter, YouTube comments, Blogs, and WhatsApp. The experimental results demonstrate that the proposed system corrects out‐of‐vocabulary words as well as real‐word errors with a maximum recall of 0.90 and 0.67, respectively for Dev_Hindi and 0.87 and 0.66, respectively for Rom_Hindi. The proposed method is also applied for state‐of‐art sentiment analysis approaches where the F1‐score has been visibly improved.</description><subject>Data mining</subject><subject>Digital media</subject><subject>Errors</subject><subject>fuzzy centrality measures</subject><subject>fuzzy graphs</subject><subject>Hindi WordNet</subject><subject>Natural language processing</subject><subject>real‐word error and non‐word error</subject><subject>Semantics</subject><subject>Sentiment analysis</subject><subject>Social networks</subject><subject>text normalization</subject><subject>Word2Vec</subject><subject>Words (language)</subject><issn>0266-4720</issn><issn>1468-0394</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kN1KwzAUgIMoOKc3PkHAO6Ezp2nT9FLGdMJARAW9CmmSbhlbM5OWrbvyEXxGn8TOeu25ORzOd374ELoEMoIubswutCOgNOZHaAAJ4xGheXKMBiRmLEqymJyisxCWhBDIMjZAT2Onzffn19rujMZTW2nbVZNqvrJhgWuzq7Fy3htVW1fhJthqjstmv2_x3MvNAstK463zGpt1YbTu2ufopJSrYC7-8hC93k1extNo9nj_ML6dRYoS4JGkPM8lFDRNSAkcYgCVpt3LRaFTdWBYRpjOjFSx0oYXNAYKJUChlNQppUN01e_dePfRmFCLpWt81Z0UlDDK05xlvKOue0p5F4I3pdh4u5a-FUDEQZk4KBO_yjoYenhrV6b9hxSTt-f3fuYHFLZwjA</recordid><startdate>202407</startdate><enddate>202407</enddate><creator>Jain, Minni</creator><creator>Jindal, Rajni</creator><creator>Jain, Amita</creator><general>Blackwell Publishing Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7TB</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-0891-3675</orcidid></search><sort><creationdate>202407</creationdate><title>Code‐mixed Hindi‐English text correction using fuzzy graph and word embedding</title><author>Jain, Minni ; Jindal, Rajni ; Jain, Amita</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3018-a3899a1b3540f181211c55146bbd5cc3016706d7eac2cde8b32131f11bccad533</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Data mining</topic><topic>Digital media</topic><topic>Errors</topic><topic>fuzzy centrality measures</topic><topic>fuzzy graphs</topic><topic>Hindi WordNet</topic><topic>Natural language processing</topic><topic>real‐word error and non‐word error</topic><topic>Semantics</topic><topic>Sentiment analysis</topic><topic>Social networks</topic><topic>text normalization</topic><topic>Word2Vec</topic><topic>Words (language)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jain, Minni</creatorcontrib><creatorcontrib>Jindal, Rajni</creatorcontrib><creatorcontrib>Jain, Amita</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jain, Minni</au><au>Jindal, Rajni</au><au>Jain, Amita</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Code‐mixed Hindi‐English text correction using fuzzy graph and word embedding</atitle><jtitle>Expert systems</jtitle><date>2024-07</date><risdate>2024</risdate><volume>41</volume><issue>7</issue><epage>n/a</epage><issn>0266-4720</issn><eissn>1468-0394</eissn><abstract>Interaction via social media involves frequent code‐mixed text, spelling errors and noisy elements, which creates a bottleneck in the performance of natural language processing applications. This proposed work is the first approach for code‐mixed Hindi‐English social media text that comprises language identification, detection and correction of non‐word (Out of Vocabulary) errors as well as real‐word errors occurring simultaneously. Each identified language (Devanagari Hindi, Roman Hindi, and English) has its own complexities and challenges. Errors are detected individually for each language and a suggestive list of the erroneous words is created. After this, a fuzzy graph between different words of the suggestive lists is generated using various semantic relations in Hindi WordNet. Word embeddings and Fuzzy graph‐based centrality measures are used to find the correct word. Several experiments are performed on different social media datasets taken from Instagram, Twitter, YouTube comments, Blogs, and WhatsApp. The experimental results demonstrate that the proposed system corrects out‐of‐vocabulary words as well as real‐word errors with a maximum recall of 0.90 and 0.67, respectively for Dev_Hindi and 0.87 and 0.66, respectively for Rom_Hindi. The proposed method is also applied for state‐of‐art sentiment analysis approaches where the F1‐score has been visibly improved.</abstract><cop>Oxford</cop><pub>Blackwell Publishing Ltd</pub><doi>10.1111/exsy.13328</doi><tpages>22</tpages><orcidid>https://orcid.org/0000-0003-0891-3675</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0266-4720
ispartof Expert systems, 2024-07, Vol.41 (7), p.n/a
issn 0266-4720
1468-0394
language eng
recordid cdi_proquest_journals_3063859678
source Wiley Online Library Journals Frontfile Complete
subjects Data mining
Digital media
Errors
fuzzy centrality measures
fuzzy graphs
Hindi WordNet
Natural language processing
real‐word error and non‐word error
Semantics
Sentiment analysis
Social networks
text normalization
Word2Vec
Words (language)
title Code‐mixed Hindi‐English text correction using fuzzy graph and word embedding
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T17%3A20%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Code%E2%80%90mixed%20Hindi%E2%80%90English%20text%20correction%20using%20fuzzy%20graph%20and%20word%20embedding&rft.jtitle=Expert%20systems&rft.au=Jain,%20Minni&rft.date=2024-07&rft.volume=41&rft.issue=7&rft.epage=n/a&rft.issn=0266-4720&rft.eissn=1468-0394&rft_id=info:doi/10.1111/exsy.13328&rft_dat=%3Cproquest_cross%3E3063859678%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3063859678&rft_id=info:pmid/&rfr_iscdi=true