Hybrid Tamil spell checker with combined character splitting
Summary Spell checker is the application, which helps in finding the spelling errors in a given text. Applications like word processors, mails, search engines, speech recognition and social media forums need these kinds of spell checking tools to increase the correctness of the system. Spell checkin...
Gespeichert in:
Veröffentlicht in: | Concurrency and computation 2023-01, Vol.35 (1), p.n/a |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | n/a |
---|---|
container_issue | 1 |
container_start_page | |
container_title | Concurrency and computation |
container_volume | 35 |
creator | Sampath, Anbukkarasi Shanmugavel, Varadhaganapathy |
description | Summary
Spell checker is the application, which helps in finding the spelling errors in a given text. Applications like word processors, mails, search engines, speech recognition and social media forums need these kinds of spell checking tools to increase the correctness of the system. Spell checking is completely implemented in languages such as English, French, and Chinese. But as far as Indian regional languages is concerned, very few works have been carried out, that too partially. Tamil is one such Indian regional language, which requires a fully implemented spell checking application as many people started using this language in social media platforms like Facebook and Twitter. Spelling errors fall on different categories in Tamil language, which involves Sandhi errors, Homophone errors (Mayangoli), and misspelt words error. To tackle all these errors, a new ensemble approach is proposed in this paper. The proposed approach consists of Levenshtein's edit distance algorithm, rule‐based algorithm, Soundex algorithm along with LSTM (Long Short Term Memory) model. We have used a special feature called combine character splitting of Tamil alphabets for feeding the LSTM model to improve the performance of the system. Proposed system produced an accuracy of 95.67%, which is approved by the Tamil scholar. |
doi_str_mv | 10.1002/cpe.7440 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2748912470</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2748912470</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2230-9935c02b684de34bfeef818b6964353b0d17e6068925d74916eaf11ca619624e3</originalsourceid><addsrcrecordid>eNp10E1Lw0AQBuBFFKxV8CcEvHhJndndbLLgRUr9gIIe6nnZbCZ2a9rE3ZTSf29qxZunGV4eZuBl7BphggD8znU0yaWEEzbCTPAUlJCnfztX5-wixhUAIggcsfvnfRl8lSzs2jdJ7KhpErck90kh2fl-mbh2XfoNVUNqg3X9kMeu8X3vNx-X7Ky2TaSr3zlm74-zxfQ5nb8-vUwf5qnjXECqtcgc8FIVsiIhy5qoLrAolVZSZKKECnNSoArNsyqXGhXZGtFZhVpxSWLMbo53u9B-bSn2ZtVuw2Z4aXguC41c5jCo26NyoY0xUG264Nc27A2COXRjhm7MoZuBpke68w3t_3Vm-jb78d94I2M3</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2748912470</pqid></control><display><type>article</type><title>Hybrid Tamil spell checker with combined character splitting</title><source>Wiley Online Library Journals Frontfile Complete</source><creator>Sampath, Anbukkarasi ; Shanmugavel, Varadhaganapathy</creator><creatorcontrib>Sampath, Anbukkarasi ; Shanmugavel, Varadhaganapathy</creatorcontrib><description>Summary
Spell checker is the application, which helps in finding the spelling errors in a given text. Applications like word processors, mails, search engines, speech recognition and social media forums need these kinds of spell checking tools to increase the correctness of the system. Spell checking is completely implemented in languages such as English, French, and Chinese. But as far as Indian regional languages is concerned, very few works have been carried out, that too partially. Tamil is one such Indian regional language, which requires a fully implemented spell checking application as many people started using this language in social media platforms like Facebook and Twitter. Spelling errors fall on different categories in Tamil language, which involves Sandhi errors, Homophone errors (Mayangoli), and misspelt words error. To tackle all these errors, a new ensemble approach is proposed in this paper. The proposed approach consists of Levenshtein's edit distance algorithm, rule‐based algorithm, Soundex algorithm along with LSTM (Long Short Term Memory) model. We have used a special feature called combine character splitting of Tamil alphabets for feeding the LSTM model to improve the performance of the system. Proposed system produced an accuracy of 95.67%, which is approved by the Tamil scholar.</description><identifier>ISSN: 1532-0626</identifier><identifier>EISSN: 1532-0634</identifier><identifier>DOI: 10.1002/cpe.7440</identifier><language>eng</language><publisher>Hoboken, USA: John Wiley & Sons, Inc</publisher><subject>Algorithms ; Digital media ; edit distance ; Errors ; Languages ; LSTM ; Natural Language Processing ; Search engines ; Social networks ; Speech recognition ; Splitting ; Tamil ; Word processors</subject><ispartof>Concurrency and computation, 2023-01, Vol.35 (1), p.n/a</ispartof><rights>2022 John Wiley & Sons, Ltd.</rights><rights>2023 John Wiley & Sons, Ltd.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c2230-9935c02b684de34bfeef818b6964353b0d17e6068925d74916eaf11ca619624e3</citedby><cites>FETCH-LOGICAL-c2230-9935c02b684de34bfeef818b6964353b0d17e6068925d74916eaf11ca619624e3</cites><orcidid>0000-0003-0226-8150</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fcpe.7440$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fcpe.7440$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,776,780,1411,27901,27902,45550,45551</link.rule.ids></links><search><creatorcontrib>Sampath, Anbukkarasi</creatorcontrib><creatorcontrib>Shanmugavel, Varadhaganapathy</creatorcontrib><title>Hybrid Tamil spell checker with combined character splitting</title><title>Concurrency and computation</title><description>Summary
Spell checker is the application, which helps in finding the spelling errors in a given text. Applications like word processors, mails, search engines, speech recognition and social media forums need these kinds of spell checking tools to increase the correctness of the system. Spell checking is completely implemented in languages such as English, French, and Chinese. But as far as Indian regional languages is concerned, very few works have been carried out, that too partially. Tamil is one such Indian regional language, which requires a fully implemented spell checking application as many people started using this language in social media platforms like Facebook and Twitter. Spelling errors fall on different categories in Tamil language, which involves Sandhi errors, Homophone errors (Mayangoli), and misspelt words error. To tackle all these errors, a new ensemble approach is proposed in this paper. The proposed approach consists of Levenshtein's edit distance algorithm, rule‐based algorithm, Soundex algorithm along with LSTM (Long Short Term Memory) model. We have used a special feature called combine character splitting of Tamil alphabets for feeding the LSTM model to improve the performance of the system. Proposed system produced an accuracy of 95.67%, which is approved by the Tamil scholar.</description><subject>Algorithms</subject><subject>Digital media</subject><subject>edit distance</subject><subject>Errors</subject><subject>Languages</subject><subject>LSTM</subject><subject>Natural Language Processing</subject><subject>Search engines</subject><subject>Social networks</subject><subject>Speech recognition</subject><subject>Splitting</subject><subject>Tamil</subject><subject>Word processors</subject><issn>1532-0626</issn><issn>1532-0634</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp10E1Lw0AQBuBFFKxV8CcEvHhJndndbLLgRUr9gIIe6nnZbCZ2a9rE3ZTSf29qxZunGV4eZuBl7BphggD8znU0yaWEEzbCTPAUlJCnfztX5-wixhUAIggcsfvnfRl8lSzs2jdJ7KhpErck90kh2fl-mbh2XfoNVUNqg3X9kMeu8X3vNx-X7Ky2TaSr3zlm74-zxfQ5nb8-vUwf5qnjXECqtcgc8FIVsiIhy5qoLrAolVZSZKKECnNSoArNsyqXGhXZGtFZhVpxSWLMbo53u9B-bSn2ZtVuw2Z4aXguC41c5jCo26NyoY0xUG264Nc27A2COXRjhm7MoZuBpke68w3t_3Vm-jb78d94I2M3</recordid><startdate>20230110</startdate><enddate>20230110</enddate><creator>Sampath, Anbukkarasi</creator><creator>Shanmugavel, Varadhaganapathy</creator><general>John Wiley & Sons, Inc</general><general>Wiley Subscription Services, Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-0226-8150</orcidid></search><sort><creationdate>20230110</creationdate><title>Hybrid Tamil spell checker with combined character splitting</title><author>Sampath, Anbukkarasi ; Shanmugavel, Varadhaganapathy</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2230-9935c02b684de34bfeef818b6964353b0d17e6068925d74916eaf11ca619624e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Digital media</topic><topic>edit distance</topic><topic>Errors</topic><topic>Languages</topic><topic>LSTM</topic><topic>Natural Language Processing</topic><topic>Search engines</topic><topic>Social networks</topic><topic>Speech recognition</topic><topic>Splitting</topic><topic>Tamil</topic><topic>Word processors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sampath, Anbukkarasi</creatorcontrib><creatorcontrib>Shanmugavel, Varadhaganapathy</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Concurrency and computation</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sampath, Anbukkarasi</au><au>Shanmugavel, Varadhaganapathy</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Hybrid Tamil spell checker with combined character splitting</atitle><jtitle>Concurrency and computation</jtitle><date>2023-01-10</date><risdate>2023</risdate><volume>35</volume><issue>1</issue><epage>n/a</epage><issn>1532-0626</issn><eissn>1532-0634</eissn><abstract>Summary
Spell checker is the application, which helps in finding the spelling errors in a given text. Applications like word processors, mails, search engines, speech recognition and social media forums need these kinds of spell checking tools to increase the correctness of the system. Spell checking is completely implemented in languages such as English, French, and Chinese. But as far as Indian regional languages is concerned, very few works have been carried out, that too partially. Tamil is one such Indian regional language, which requires a fully implemented spell checking application as many people started using this language in social media platforms like Facebook and Twitter. Spelling errors fall on different categories in Tamil language, which involves Sandhi errors, Homophone errors (Mayangoli), and misspelt words error. To tackle all these errors, a new ensemble approach is proposed in this paper. The proposed approach consists of Levenshtein's edit distance algorithm, rule‐based algorithm, Soundex algorithm along with LSTM (Long Short Term Memory) model. We have used a special feature called combine character splitting of Tamil alphabets for feeding the LSTM model to improve the performance of the system. Proposed system produced an accuracy of 95.67%, which is approved by the Tamil scholar.</abstract><cop>Hoboken, USA</cop><pub>John Wiley & Sons, Inc</pub><doi>10.1002/cpe.7440</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-0226-8150</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1532-0626 |
ispartof | Concurrency and computation, 2023-01, Vol.35 (1), p.n/a |
issn | 1532-0626 1532-0634 |
language | eng |
recordid | cdi_proquest_journals_2748912470 |
source | Wiley Online Library Journals Frontfile Complete |
subjects | Algorithms Digital media edit distance Errors Languages LSTM Natural Language Processing Search engines Social networks Speech recognition Splitting Tamil Word processors |
title | Hybrid Tamil spell checker with combined character splitting |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T23%3A37%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Hybrid%20Tamil%20spell%20checker%20with%20combined%20character%20splitting&rft.jtitle=Concurrency%20and%20computation&rft.au=Sampath,%20Anbukkarasi&rft.date=2023-01-10&rft.volume=35&rft.issue=1&rft.epage=n/a&rft.issn=1532-0626&rft.eissn=1532-0634&rft_id=info:doi/10.1002/cpe.7440&rft_dat=%3Cproquest_cross%3E2748912470%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2748912470&rft_id=info:pmid/&rfr_iscdi=true |