Code‐mixed Hindi‐English text correction using fuzzy graph and word embedding
Interaction via social media involves frequent code‐mixed text, spelling errors and noisy elements, which creates a bottleneck in the performance of natural language processing applications. This proposed work is the first approach for code‐mixed Hindi‐English social media text that comprises langua...
Gespeichert in:
Veröffentlicht in: | Expert systems 2024-07, Vol.41 (7), p.n/a |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | n/a |
---|---|
container_issue | 7 |
container_start_page | |
container_title | Expert systems |
container_volume | 41 |
creator | Jain, Minni Jindal, Rajni Jain, Amita |
description | Interaction via social media involves frequent code‐mixed text, spelling errors and noisy elements, which creates a bottleneck in the performance of natural language processing applications. This proposed work is the first approach for code‐mixed Hindi‐English social media text that comprises language identification, detection and correction of non‐word (Out of Vocabulary) errors as well as real‐word errors occurring simultaneously. Each identified language (Devanagari Hindi, Roman Hindi, and English) has its own complexities and challenges. Errors are detected individually for each language and a suggestive list of the erroneous words is created. After this, a fuzzy graph between different words of the suggestive lists is generated using various semantic relations in Hindi WordNet. Word embeddings and Fuzzy graph‐based centrality measures are used to find the correct word. Several experiments are performed on different social media datasets taken from Instagram, Twitter, YouTube comments, Blogs, and WhatsApp. The experimental results demonstrate that the proposed system corrects out‐of‐vocabulary words as well as real‐word errors with a maximum recall of 0.90 and 0.67, respectively for Dev_Hindi and 0.87 and 0.66, respectively for Rom_Hindi. The proposed method is also applied for state‐of‐art sentiment analysis approaches where the F1‐score has been visibly improved. |
doi_str_mv | 10.1111/exsy.13328 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3063859678</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3063859678</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3018-a3899a1b3540f181211c55146bbd5cc3016706d7eac2cde8b32131f11bccad533</originalsourceid><addsrcrecordid>eNp9kN1KwzAUgIMoOKc3PkHAO6Ezp2nT9FLGdMJARAW9CmmSbhlbM5OWrbvyEXxGn8TOeu25ORzOd374ELoEMoIubswutCOgNOZHaAAJ4xGheXKMBiRmLEqymJyisxCWhBDIMjZAT2Onzffn19rujMZTW2nbVZNqvrJhgWuzq7Fy3htVW1fhJthqjstmv2_x3MvNAstK463zGpt1YbTu2ufopJSrYC7-8hC93k1extNo9nj_ML6dRYoS4JGkPM8lFDRNSAkcYgCVpt3LRaFTdWBYRpjOjFSx0oYXNAYKJUChlNQppUN01e_dePfRmFCLpWt81Z0UlDDK05xlvKOue0p5F4I3pdh4u5a-FUDEQZk4KBO_yjoYenhrV6b9hxSTt-f3fuYHFLZwjA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3063859678</pqid></control><display><type>article</type><title>Code‐mixed Hindi‐English text correction using fuzzy graph and word embedding</title><source>Wiley Online Library Journals Frontfile Complete</source><creator>Jain, Minni ; Jindal, Rajni ; Jain, Amita</creator><creatorcontrib>Jain, Minni ; Jindal, Rajni ; Jain, Amita</creatorcontrib><description>Interaction via social media involves frequent code‐mixed text, spelling errors and noisy elements, which creates a bottleneck in the performance of natural language processing applications. This proposed work is the first approach for code‐mixed Hindi‐English social media text that comprises language identification, detection and correction of non‐word (Out of Vocabulary) errors as well as real‐word errors occurring simultaneously. Each identified language (Devanagari Hindi, Roman Hindi, and English) has its own complexities and challenges. Errors are detected individually for each language and a suggestive list of the erroneous words is created. After this, a fuzzy graph between different words of the suggestive lists is generated using various semantic relations in Hindi WordNet. Word embeddings and Fuzzy graph‐based centrality measures are used to find the correct word. Several experiments are performed on different social media datasets taken from Instagram, Twitter, YouTube comments, Blogs, and WhatsApp. The experimental results demonstrate that the proposed system corrects out‐of‐vocabulary words as well as real‐word errors with a maximum recall of 0.90 and 0.67, respectively for Dev_Hindi and 0.87 and 0.66, respectively for Rom_Hindi. The proposed method is also applied for state‐of‐art sentiment analysis approaches where the F1‐score has been visibly improved.</description><identifier>ISSN: 0266-4720</identifier><identifier>EISSN: 1468-0394</identifier><identifier>DOI: 10.1111/exsy.13328</identifier><language>eng</language><publisher>Oxford: Blackwell Publishing Ltd</publisher><subject>Data mining ; Digital media ; Errors ; fuzzy centrality measures ; fuzzy graphs ; Hindi WordNet ; Natural language processing ; real‐word error and non‐word error ; Semantics ; Sentiment analysis ; Social networks ; text normalization ; Word2Vec ; Words (language)</subject><ispartof>Expert systems, 2024-07, Vol.41 (7), p.n/a</ispartof><rights>2023 John Wiley & Sons Ltd.</rights><rights>2024 John Wiley & Sons, Ltd.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3018-a3899a1b3540f181211c55146bbd5cc3016706d7eac2cde8b32131f11bccad533</citedby><cites>FETCH-LOGICAL-c3018-a3899a1b3540f181211c55146bbd5cc3016706d7eac2cde8b32131f11bccad533</cites><orcidid>0000-0003-0891-3675</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1111%2Fexsy.13328$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1111%2Fexsy.13328$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids></links><search><creatorcontrib>Jain, Minni</creatorcontrib><creatorcontrib>Jindal, Rajni</creatorcontrib><creatorcontrib>Jain, Amita</creatorcontrib><title>Code‐mixed Hindi‐English text correction using fuzzy graph and word embedding</title><title>Expert systems</title><description>Interaction via social media involves frequent code‐mixed text, spelling errors and noisy elements, which creates a bottleneck in the performance of natural language processing applications. This proposed work is the first approach for code‐mixed Hindi‐English social media text that comprises language identification, detection and correction of non‐word (Out of Vocabulary) errors as well as real‐word errors occurring simultaneously. Each identified language (Devanagari Hindi, Roman Hindi, and English) has its own complexities and challenges. Errors are detected individually for each language and a suggestive list of the erroneous words is created. After this, a fuzzy graph between different words of the suggestive lists is generated using various semantic relations in Hindi WordNet. Word embeddings and Fuzzy graph‐based centrality measures are used to find the correct word. Several experiments are performed on different social media datasets taken from Instagram, Twitter, YouTube comments, Blogs, and WhatsApp. The experimental results demonstrate that the proposed system corrects out‐of‐vocabulary words as well as real‐word errors with a maximum recall of 0.90 and 0.67, respectively for Dev_Hindi and 0.87 and 0.66, respectively for Rom_Hindi. The proposed method is also applied for state‐of‐art sentiment analysis approaches where the F1‐score has been visibly improved.</description><subject>Data mining</subject><subject>Digital media</subject><subject>Errors</subject><subject>fuzzy centrality measures</subject><subject>fuzzy graphs</subject><subject>Hindi WordNet</subject><subject>Natural language processing</subject><subject>real‐word error and non‐word error</subject><subject>Semantics</subject><subject>Sentiment analysis</subject><subject>Social networks</subject><subject>text normalization</subject><subject>Word2Vec</subject><subject>Words (language)</subject><issn>0266-4720</issn><issn>1468-0394</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kN1KwzAUgIMoOKc3PkHAO6Ezp2nT9FLGdMJARAW9CmmSbhlbM5OWrbvyEXxGn8TOeu25ORzOd374ELoEMoIubswutCOgNOZHaAAJ4xGheXKMBiRmLEqymJyisxCWhBDIMjZAT2Onzffn19rujMZTW2nbVZNqvrJhgWuzq7Fy3htVW1fhJthqjstmv2_x3MvNAstK463zGpt1YbTu2ufopJSrYC7-8hC93k1extNo9nj_ML6dRYoS4JGkPM8lFDRNSAkcYgCVpt3LRaFTdWBYRpjOjFSx0oYXNAYKJUChlNQppUN01e_dePfRmFCLpWt81Z0UlDDK05xlvKOue0p5F4I3pdh4u5a-FUDEQZk4KBO_yjoYenhrV6b9hxSTt-f3fuYHFLZwjA</recordid><startdate>202407</startdate><enddate>202407</enddate><creator>Jain, Minni</creator><creator>Jindal, Rajni</creator><creator>Jain, Amita</creator><general>Blackwell Publishing Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7TB</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-0891-3675</orcidid></search><sort><creationdate>202407</creationdate><title>Code‐mixed Hindi‐English text correction using fuzzy graph and word embedding</title><author>Jain, Minni ; Jindal, Rajni ; Jain, Amita</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3018-a3899a1b3540f181211c55146bbd5cc3016706d7eac2cde8b32131f11bccad533</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Data mining</topic><topic>Digital media</topic><topic>Errors</topic><topic>fuzzy centrality measures</topic><topic>fuzzy graphs</topic><topic>Hindi WordNet</topic><topic>Natural language processing</topic><topic>real‐word error and non‐word error</topic><topic>Semantics</topic><topic>Sentiment analysis</topic><topic>Social networks</topic><topic>text normalization</topic><topic>Word2Vec</topic><topic>Words (language)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jain, Minni</creatorcontrib><creatorcontrib>Jindal, Rajni</creatorcontrib><creatorcontrib>Jain, Amita</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jain, Minni</au><au>Jindal, Rajni</au><au>Jain, Amita</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Code‐mixed Hindi‐English text correction using fuzzy graph and word embedding</atitle><jtitle>Expert systems</jtitle><date>2024-07</date><risdate>2024</risdate><volume>41</volume><issue>7</issue><epage>n/a</epage><issn>0266-4720</issn><eissn>1468-0394</eissn><abstract>Interaction via social media involves frequent code‐mixed text, spelling errors and noisy elements, which creates a bottleneck in the performance of natural language processing applications. This proposed work is the first approach for code‐mixed Hindi‐English social media text that comprises language identification, detection and correction of non‐word (Out of Vocabulary) errors as well as real‐word errors occurring simultaneously. Each identified language (Devanagari Hindi, Roman Hindi, and English) has its own complexities and challenges. Errors are detected individually for each language and a suggestive list of the erroneous words is created. After this, a fuzzy graph between different words of the suggestive lists is generated using various semantic relations in Hindi WordNet. Word embeddings and Fuzzy graph‐based centrality measures are used to find the correct word. Several experiments are performed on different social media datasets taken from Instagram, Twitter, YouTube comments, Blogs, and WhatsApp. The experimental results demonstrate that the proposed system corrects out‐of‐vocabulary words as well as real‐word errors with a maximum recall of 0.90 and 0.67, respectively for Dev_Hindi and 0.87 and 0.66, respectively for Rom_Hindi. The proposed method is also applied for state‐of‐art sentiment analysis approaches where the F1‐score has been visibly improved.</abstract><cop>Oxford</cop><pub>Blackwell Publishing Ltd</pub><doi>10.1111/exsy.13328</doi><tpages>22</tpages><orcidid>https://orcid.org/0000-0003-0891-3675</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0266-4720 |
ispartof | Expert systems, 2024-07, Vol.41 (7), p.n/a |
issn | 0266-4720 1468-0394 |
language | eng |
recordid | cdi_proquest_journals_3063859678 |
source | Wiley Online Library Journals Frontfile Complete |
subjects | Data mining Digital media Errors fuzzy centrality measures fuzzy graphs Hindi WordNet Natural language processing real‐word error and non‐word error Semantics Sentiment analysis Social networks text normalization Word2Vec Words (language) |
title | Code‐mixed Hindi‐English text correction using fuzzy graph and word embedding |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T17%3A20%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Code%E2%80%90mixed%20Hindi%E2%80%90English%20text%20correction%20using%20fuzzy%20graph%20and%20word%20embedding&rft.jtitle=Expert%20systems&rft.au=Jain,%20Minni&rft.date=2024-07&rft.volume=41&rft.issue=7&rft.epage=n/a&rft.issn=0266-4720&rft.eissn=1468-0394&rft_id=info:doi/10.1111/exsy.13328&rft_dat=%3Cproquest_cross%3E3063859678%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3063859678&rft_id=info:pmid/&rfr_iscdi=true |