Code‐mixed Hindi‐English text correction using fuzzy graph and word embedding

Interaction via social media involves frequent code‐mixed text, spelling errors and noisy elements, which creates a bottleneck in the performance of natural language processing applications. This proposed work is the first approach for code‐mixed Hindi‐English social media text that comprises langua...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems 2024-07, Vol.41 (7), p.n/a
Hauptverfasser:	Jain, Minni, Jindal, Rajni, Jain, Amita
Format:	Artikel
Sprache:	eng
Schlagworte:	Data mining Digital media Errors fuzzy centrality measures fuzzy graphs Hindi WordNet Natural language processing real‐word error and non‐word error Semantics Sentiment analysis Social networks text normalization Word2Vec Words (language)
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	n/a
container_issue	7
container_start_page
container_title	Expert systems
container_volume	41
creator	Jain, Minni Jindal, Rajni Jain, Amita
description	Interaction via social media involves frequent code‐mixed text, spelling errors and noisy elements, which creates a bottleneck in the performance of natural language processing applications. This proposed work is the first approach for code‐mixed Hindi‐English social media text that comprises language identification, detection and correction of non‐word (Out of Vocabulary) errors as well as real‐word errors occurring simultaneously. Each identified language (Devanagari Hindi, Roman Hindi, and English) has its own complexities and challenges. Errors are detected individually for each language and a suggestive list of the erroneous words is created. After this, a fuzzy graph between different words of the suggestive lists is generated using various semantic relations in Hindi WordNet. Word embeddings and Fuzzy graph‐based centrality measures are used to find the correct word. Several experiments are performed on different social media datasets taken from Instagram, Twitter, YouTube comments, Blogs, and WhatsApp. The experimental results demonstrate that the proposed system corrects out‐of‐vocabulary words as well as real‐word errors with a maximum recall of 0.90 and 0.67, respectively for Dev_Hindi and 0.87 and 0.66, respectively for Rom_Hindi. The proposed method is also applied for state‐of‐art sentiment analysis approaches where the F1‐score has been visibly improved.
doi_str_mv	10.1111/exsy.13328
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3063859678</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3063859678</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3018-a3899a1b3540f181211c55146bbd5cc3016706d7eac2cde8b32131f11bccad533</originalsourceid><addsrcrecordid>eNp9kN1KwzAUgIMoOKc3PkHAO6Ezp2nT9FLGdMJARAW9CmmSbhlbM5OWrbvyEXxGn8TOeu25ORzOd374ELoEMoIubswutCOgNOZHaAAJ4xGheXKMBiRmLEqymJyisxCWhBDIMjZAT2Onzffn19rujMZTW2nbVZNqvrJhgWuzq7Fy3htVW1fhJthqjstmv2_x3MvNAstK463zGpt1YbTu2ufopJSrYC7-8hC93k1extNo9nj_ML6dRYoS4JGkPM8lFDRNSAkcYgCVpt3LRaFTdWBYRpjOjFSx0oYXNAYKJUChlNQppUN01e_dePfRmFCLpWt81Z0UlDDK05xlvKOue0p5F4I3pdh4u5a-FUDEQZk4KBO_yjoYenhrV6b9hxSTt-f3fuYHFLZwjA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3063859678</pqid></control><display><type>article</type><title>Code‐mixed Hindi‐English text correction using fuzzy graph and word embedding</title><source>Wiley Online Library Journals Frontfile Complete</source><creator>Jain, Minni ; Jindal, Rajni ; Jain, Amita</creator><creatorcontrib>Jain, Minni ; Jindal, Rajni ; Jain, Amita</creatorcontrib><description>Interaction via social media involves frequent code‐mixed text, spelling errors and noisy elements, which creates a bottleneck in the performance of natural language processing applications. This proposed work is the first approach for code‐mixed Hindi‐English social media text that comprises language identification, detection and correction of non‐word (Out of Vocabulary) errors as well as real‐word errors occurring simultaneously. Each identified language (Devanagari Hindi, Roman Hindi, and English) has its own complexities and challenges. Errors are detected individually for each language and a suggestive list of the erroneous words is created. After this, a fuzzy graph between different words of the suggestive lists is generated using various semantic relations in Hindi WordNet. Word embeddings and Fuzzy graph‐based centrality measures are used to find the correct word. Several experiments are performed on different social media datasets taken from Instagram, Twitter, YouTube comments, Blogs, and WhatsApp. The experimental results demonstrate that the proposed system corrects out‐of‐vocabulary words as well as real‐word errors with a maximum recall of 0.90 and 0.67, respectively for Dev_Hindi and 0.87 and 0.66, respectively for Rom_Hindi. The proposed method is also applied for state‐of‐art sentiment analysis approaches where the F1‐score has been visibly improved.</description><identifier>ISSN: 0266-4720</identifier><identifier>EISSN: 1468-0394</identifier><identifier>DOI: 10.1111/exsy.13328</identifier><language>eng</language><publisher>Oxford: Blackwell Publishing Ltd</publisher><subject>Data mining ; Digital media ; Errors ; fuzzy centrality measures ; fuzzy graphs ; Hindi WordNet ; Natural language processing ; real‐word error and non‐word error ; Semantics ; Sentiment analysis ; Social networks ; text normalization ; Word2Vec ; Words (language)</subject><ispartof>Expert systems, 2024-07, Vol.41 (7), p.n/a</ispartof><rights>2023 John Wiley & Sons Ltd.</rights><rights>2024 John Wiley & Sons, Ltd.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3018-a3899a1b3540f181211c55146bbd5cc3016706d7eac2cde8b32131f11bccad533</citedby><cites>FETCH-LOGICAL-c3018-a3899a1b3540f181211c55146bbd5cc3016706d7eac2cde8b32131f11bccad533</cites><orcidid>0000-0003-0891-3675</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1111%2Fexsy.13328$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1111%2Fexsy.13328$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids></links><search><creatorcontrib>Jain, Minni</creatorcontrib><creatorcontrib>Jindal, Rajni</creatorcontrib><creatorcontrib>Jain, Amita</creatorcontrib><title>Code‐mixed Hindi‐English text correction using fuzzy graph and word embedding</title><title>Expert systems</title><description>Interaction via social media involves frequent code‐mixed text, spelling errors and noisy elements, which creates a bottleneck in the performance of natural language processing applications. This proposed work is the first approach for code‐mixed Hindi‐English social media text that comprises language identification, detection and correction of non‐word (Out of Vocabulary) errors as well as real‐word errors occurring simultaneously. Each identified language (Devanagari Hindi, Roman Hindi, and English) has its own complexities and challenges. Errors are detected individually for each language and a suggestive list of the erroneous words is created. After this, a fuzzy graph between different words of the suggestive lists is generated using various semantic relations in Hindi WordNet. Word embeddings and Fuzzy graph‐based centrality measures are used to find the correct word. Several experiments are performed on different social media datasets taken from Instagram, Twitter, YouTube comments, Blogs, and WhatsApp. The experimental results demonstrate that the proposed system corrects out‐of‐vocabulary words as well as real‐word errors with a maximum recall of 0.90 and 0.67, respectively for Dev_Hindi and 0.87 and 0.66, respectively for Rom_Hindi. The proposed method is also applied for state‐of‐art sentiment analysis approaches where the F1‐score has been visibly improved.</description><subject>Data mining</subject><subject>Digital media</subject><subject>Errors</subject><subject>fuzzy centrality measures</subject><subject>fuzzy graphs</subject><subject>Hindi WordNet</subject><subject>Natural language processing</subject><subject>real‐word error and non‐word error</subject><subject>Semantics</subject><subject>Sentiment analysis</subject><subject>Social networks</subject><subject>text normalization</subject><subject>Word2Vec</subject><subject>Words (language)</subject><issn>0266-4720</issn><issn>1468-0394</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kN1KwzAUgIMoOKc3PkHAO6Ezp2nT9FLGdMJARAW9CmmSbhlbM5OWrbvyEXxGn8TOeu25ORzOd374ELoEMoIubswutCOgNOZHaAAJ4xGheXKMBiRmLEqymJyisxCWhBDIMjZAT2Onzffn19rujMZTW2nbVZNqvrJhgWuzq7Fy3htVW1fhJthqjstmv2_x3MvNAstK463zGpt1YbTu2ufopJSrYC7-8hC93k1extNo9nj_ML6dRYoS4JGkPM8lFDRNSAkcYgCVpt3LRaFTdWBYRpjOjFSx0oYXNAYKJUChlNQppUN01e_dePfRmFCLpWt81Z0UlDDK05xlvKOue0p5F4I3pdh4u5a-FUDEQZk4KBO_yjoYenhrV6b9hxSTt-f3fuYHFLZwjA</recordid><startdate>202407</startdate><enddate>202407</enddate><creator>Jain, Minni</creator><creator>Jindal, Rajni</creator><creator>Jain, Amita</creator><general>Blackwell Publishing Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7TB</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-0891-3675</orcidid></search><sort><creationdate>202407</creationdate><title>Code‐mixed Hindi‐English text correction using fuzzy graph and word embedding</title><author>Jain, Minni ; Jindal, Rajni ; Jain, Amita</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3018-a3899a1b3540f181211c55146bbd5cc3016706d7eac2cde8b32131f11bccad533</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Data mining</topic><topic>Digital media</topic><topic>Errors</topic><topic>fuzzy centrality measures</topic><topic>fuzzy graphs</topic><topic>Hindi WordNet</topic><topic>Natural language processing</topic><topic>real‐word error and non‐word error</topic><topic>Semantics</topic><topic>Sentiment analysis</topic><topic>Social networks</topic><topic>text normalization</topic><topic>Word2Vec</topic><topic>Words (language)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jain, Minni</creatorcontrib><creatorcontrib>Jindal, Rajni</creatorcontrib><creatorcontrib>Jain, Amita</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jain, Minni</au><au>Jindal, Rajni</au><au>Jain, Amita</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Code‐mixed Hindi‐English text correction using fuzzy graph and word embedding</atitle><jtitle>Expert systems</jtitle><date>2024-07</date><risdate>2024</risdate><volume>41</volume><issue>7</issue><epage>n/a</epage><issn>0266-4720</issn><eissn>1468-0394</eissn><abstract>Interaction via social media involves frequent code‐mixed text, spelling errors and noisy elements, which creates a bottleneck in the performance of natural language processing applications. This proposed work is the first approach for code‐mixed Hindi‐English social media text that comprises language identification, detection and correction of non‐word (Out of Vocabulary) errors as well as real‐word errors occurring simultaneously. Each identified language (Devanagari Hindi, Roman Hindi, and English) has its own complexities and challenges. Errors are detected individually for each language and a suggestive list of the erroneous words is created. After this, a fuzzy graph between different words of the suggestive lists is generated using various semantic relations in Hindi WordNet. Word embeddings and Fuzzy graph‐based centrality measures are used to find the correct word. Several experiments are performed on different social media datasets taken from Instagram, Twitter, YouTube comments, Blogs, and WhatsApp. The experimental results demonstrate that the proposed system corrects out‐of‐vocabulary words as well as real‐word errors with a maximum recall of 0.90 and 0.67, respectively for Dev_Hindi and 0.87 and 0.66, respectively for Rom_Hindi. The proposed method is also applied for state‐of‐art sentiment analysis approaches where the F1‐score has been visibly improved.</abstract><cop>Oxford</cop><pub>Blackwell Publishing Ltd</pub><doi>10.1111/exsy.13328</doi><tpages>22</tpages><orcidid>https://orcid.org/0000-0003-0891-3675</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0266-4720
ispartof	Expert systems, 2024-07, Vol.41 (7), p.n/a
issn	0266-4720 1468-0394
language	eng
recordid	cdi_proquest_journals_3063859678
source	Wiley Online Library Journals Frontfile Complete
subjects	Data mining Digital media Errors fuzzy centrality measures fuzzy graphs Hindi WordNet Natural language processing real‐word error and non‐word error Semantics Sentiment analysis Social networks text normalization Word2Vec Words (language)
title	Code‐mixed Hindi‐English text correction using fuzzy graph and word embedding
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T17%3A20%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Code%E2%80%90mixed%20Hindi%E2%80%90English%20text%20correction%20using%20fuzzy%20graph%20and%20word%20embedding&rft.jtitle=Expert%20systems&rft.au=Jain,%20Minni&rft.date=2024-07&rft.volume=41&rft.issue=7&rft.epage=n/a&rft.issn=0266-4720&rft.eissn=1468-0394&rft_id=info:doi/10.1111/exsy.13328&rft_dat=%3Cproquest_cross%3E3063859678%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3063859678&rft_id=info:pmid/&rfr_iscdi=true