Hybrid Tamil spell checker with combined character splitting

Summary Spell checker is the application, which helps in finding the spelling errors in a given text. Applications like word processors, mails, search engines, speech recognition and social media forums need these kinds of spell checking tools to increase the correctness of the system. Spell checkin...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Concurrency and computation 2023-01, Vol.35 (1), p.n/a
Hauptverfasser:	Sampath, Anbukkarasi, Shanmugavel, Varadhaganapathy
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Digital media edit distance Errors Languages LSTM Natural Language Processing Search engines Social networks Speech recognition Splitting Tamil Word processors
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	n/a
container_issue	1
container_start_page
container_title	Concurrency and computation
container_volume	35
creator	Sampath, Anbukkarasi Shanmugavel, Varadhaganapathy
description	Summary Spell checker is the application, which helps in finding the spelling errors in a given text. Applications like word processors, mails, search engines, speech recognition and social media forums need these kinds of spell checking tools to increase the correctness of the system. Spell checking is completely implemented in languages such as English, French, and Chinese. But as far as Indian regional languages is concerned, very few works have been carried out, that too partially. Tamil is one such Indian regional language, which requires a fully implemented spell checking application as many people started using this language in social media platforms like Facebook and Twitter. Spelling errors fall on different categories in Tamil language, which involves Sandhi errors, Homophone errors (Mayangoli), and misspelt words error. To tackle all these errors, a new ensemble approach is proposed in this paper. The proposed approach consists of Levenshtein's edit distance algorithm, rule‐based algorithm, Soundex algorithm along with LSTM (Long Short Term Memory) model. We have used a special feature called combine character splitting of Tamil alphabets for feeding the LSTM model to improve the performance of the system. Proposed system produced an accuracy of 95.67%, which is approved by the Tamil scholar.
doi_str_mv	10.1002/cpe.7440
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2748912470</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2748912470</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2230-9935c02b684de34bfeef818b6964353b0d17e6068925d74916eaf11ca619624e3</originalsourceid><addsrcrecordid>eNp10E1Lw0AQBuBFFKxV8CcEvHhJndndbLLgRUr9gIIe6nnZbCZ2a9rE3ZTSf29qxZunGV4eZuBl7BphggD8znU0yaWEEzbCTPAUlJCnfztX5-wixhUAIggcsfvnfRl8lSzs2jdJ7KhpErck90kh2fl-mbh2XfoNVUNqg3X9kMeu8X3vNx-X7Ky2TaSr3zlm74-zxfQ5nb8-vUwf5qnjXECqtcgc8FIVsiIhy5qoLrAolVZSZKKECnNSoArNsyqXGhXZGtFZhVpxSWLMbo53u9B-bSn2ZtVuw2Z4aXguC41c5jCo26NyoY0xUG264Nc27A2COXRjhm7MoZuBpke68w3t_3Vm-jb78d94I2M3</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2748912470</pqid></control><display><type>article</type><title>Hybrid Tamil spell checker with combined character splitting</title><source>Wiley Online Library Journals Frontfile Complete</source><creator>Sampath, Anbukkarasi ; Shanmugavel, Varadhaganapathy</creator><creatorcontrib>Sampath, Anbukkarasi ; Shanmugavel, Varadhaganapathy</creatorcontrib><description>Summary Spell checker is the application, which helps in finding the spelling errors in a given text. Applications like word processors, mails, search engines, speech recognition and social media forums need these kinds of spell checking tools to increase the correctness of the system. Spell checking is completely implemented in languages such as English, French, and Chinese. But as far as Indian regional languages is concerned, very few works have been carried out, that too partially. Tamil is one such Indian regional language, which requires a fully implemented spell checking application as many people started using this language in social media platforms like Facebook and Twitter. Spelling errors fall on different categories in Tamil language, which involves Sandhi errors, Homophone errors (Mayangoli), and misspelt words error. To tackle all these errors, a new ensemble approach is proposed in this paper. The proposed approach consists of Levenshtein's edit distance algorithm, rule‐based algorithm, Soundex algorithm along with LSTM (Long Short Term Memory) model. We have used a special feature called combine character splitting of Tamil alphabets for feeding the LSTM model to improve the performance of the system. Proposed system produced an accuracy of 95.67%, which is approved by the Tamil scholar.</description><identifier>ISSN: 1532-0626</identifier><identifier>EISSN: 1532-0634</identifier><identifier>DOI: 10.1002/cpe.7440</identifier><language>eng</language><publisher>Hoboken, USA: John Wiley & Sons, Inc</publisher><subject>Algorithms ; Digital media ; edit distance ; Errors ; Languages ; LSTM ; Natural Language Processing ; Search engines ; Social networks ; Speech recognition ; Splitting ; Tamil ; Word processors</subject><ispartof>Concurrency and computation, 2023-01, Vol.35 (1), p.n/a</ispartof><rights>2022 John Wiley & Sons, Ltd.</rights><rights>2023 John Wiley & Sons, Ltd.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c2230-9935c02b684de34bfeef818b6964353b0d17e6068925d74916eaf11ca619624e3</citedby><cites>FETCH-LOGICAL-c2230-9935c02b684de34bfeef818b6964353b0d17e6068925d74916eaf11ca619624e3</cites><orcidid>0000-0003-0226-8150</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fcpe.7440$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fcpe.7440$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids></links><search><creatorcontrib>Sampath, Anbukkarasi</creatorcontrib><creatorcontrib>Shanmugavel, Varadhaganapathy</creatorcontrib><title>Hybrid Tamil spell checker with combined character splitting</title><title>Concurrency and computation</title><description>Summary Spell checker is the application, which helps in finding the spelling errors in a given text. Applications like word processors, mails, search engines, speech recognition and social media forums need these kinds of spell checking tools to increase the correctness of the system. Spell checking is completely implemented in languages such as English, French, and Chinese. But as far as Indian regional languages is concerned, very few works have been carried out, that too partially. Tamil is one such Indian regional language, which requires a fully implemented spell checking application as many people started using this language in social media platforms like Facebook and Twitter. Spelling errors fall on different categories in Tamil language, which involves Sandhi errors, Homophone errors (Mayangoli), and misspelt words error. To tackle all these errors, a new ensemble approach is proposed in this paper. The proposed approach consists of Levenshtein's edit distance algorithm, rule‐based algorithm, Soundex algorithm along with LSTM (Long Short Term Memory) model. We have used a special feature called combine character splitting of Tamil alphabets for feeding the LSTM model to improve the performance of the system. Proposed system produced an accuracy of 95.67%, which is approved by the Tamil scholar.</description><subject>Algorithms</subject><subject>Digital media</subject><subject>edit distance</subject><subject>Errors</subject><subject>Languages</subject><subject>LSTM</subject><subject>Natural Language Processing</subject><subject>Search engines</subject><subject>Social networks</subject><subject>Speech recognition</subject><subject>Splitting</subject><subject>Tamil</subject><subject>Word processors</subject><issn>1532-0626</issn><issn>1532-0634</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp10E1Lw0AQBuBFFKxV8CcEvHhJndndbLLgRUr9gIIe6nnZbCZ2a9rE3ZTSf29qxZunGV4eZuBl7BphggD8znU0yaWEEzbCTPAUlJCnfztX5-wixhUAIggcsfvnfRl8lSzs2jdJ7KhpErck90kh2fl-mbh2XfoNVUNqg3X9kMeu8X3vNx-X7Ky2TaSr3zlm74-zxfQ5nb8-vUwf5qnjXECqtcgc8FIVsiIhy5qoLrAolVZSZKKECnNSoArNsyqXGhXZGtFZhVpxSWLMbo53u9B-bSn2ZtVuw2Z4aXguC41c5jCo26NyoY0xUG264Nc27A2COXRjhm7MoZuBpke68w3t_3Vm-jb78d94I2M3</recordid><startdate>20230110</startdate><enddate>20230110</enddate><creator>Sampath, Anbukkarasi</creator><creator>Shanmugavel, Varadhaganapathy</creator><general>John Wiley & Sons, Inc</general><general>Wiley Subscription Services, Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-0226-8150</orcidid></search><sort><creationdate>20230110</creationdate><title>Hybrid Tamil spell checker with combined character splitting</title><author>Sampath, Anbukkarasi ; Shanmugavel, Varadhaganapathy</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2230-9935c02b684de34bfeef818b6964353b0d17e6068925d74916eaf11ca619624e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Digital media</topic><topic>edit distance</topic><topic>Errors</topic><topic>Languages</topic><topic>LSTM</topic><topic>Natural Language Processing</topic><topic>Search engines</topic><topic>Social networks</topic><topic>Speech recognition</topic><topic>Splitting</topic><topic>Tamil</topic><topic>Word processors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sampath, Anbukkarasi</creatorcontrib><creatorcontrib>Shanmugavel, Varadhaganapathy</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Concurrency and computation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sampath, Anbukkarasi</au><au>Shanmugavel, Varadhaganapathy</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Hybrid Tamil spell checker with combined character splitting</atitle><jtitle>Concurrency and computation</jtitle><date>2023-01-10</date><risdate>2023</risdate><volume>35</volume><issue>1</issue><epage>n/a</epage><issn>1532-0626</issn><eissn>1532-0634</eissn><abstract>Summary Spell checker is the application, which helps in finding the spelling errors in a given text. Applications like word processors, mails, search engines, speech recognition and social media forums need these kinds of spell checking tools to increase the correctness of the system. Spell checking is completely implemented in languages such as English, French, and Chinese. But as far as Indian regional languages is concerned, very few works have been carried out, that too partially. Tamil is one such Indian regional language, which requires a fully implemented spell checking application as many people started using this language in social media platforms like Facebook and Twitter. Spelling errors fall on different categories in Tamil language, which involves Sandhi errors, Homophone errors (Mayangoli), and misspelt words error. To tackle all these errors, a new ensemble approach is proposed in this paper. The proposed approach consists of Levenshtein's edit distance algorithm, rule‐based algorithm, Soundex algorithm along with LSTM (Long Short Term Memory) model. We have used a special feature called combine character splitting of Tamil alphabets for feeding the LSTM model to improve the performance of the system. Proposed system produced an accuracy of 95.67%, which is approved by the Tamil scholar.</abstract><cop>Hoboken, USA</cop><pub>John Wiley & Sons, Inc</pub><doi>10.1002/cpe.7440</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-0226-8150</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1532-0626
ispartof	Concurrency and computation, 2023-01, Vol.35 (1), p.n/a
issn	1532-0626 1532-0634
language	eng
recordid	cdi_proquest_journals_2748912470
source	Wiley Online Library Journals Frontfile Complete
subjects	Algorithms Digital media edit distance Errors Languages LSTM Natural Language Processing Search engines Social networks Speech recognition Splitting Tamil Word processors
title	Hybrid Tamil spell checker with combined character splitting
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T23%3A37%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Hybrid%20Tamil%20spell%20checker%20with%20combined%20character%20splitting&rft.jtitle=Concurrency%20and%20computation&rft.au=Sampath,%20Anbukkarasi&rft.date=2023-01-10&rft.volume=35&rft.issue=1&rft.epage=n/a&rft.issn=1532-0626&rft.eissn=1532-0634&rft_id=info:doi/10.1002/cpe.7440&rft_dat=%3Cproquest_cross%3E2748912470%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2748912470&rft_id=info:pmid/&rfr_iscdi=true