An Empirical Analysis of Language Detection in Dravidian Languages
Objectives: Language detection is the process of identifying a language associated with a text. The proposed system aims to detect the Dravidian language that is associated with the given text using different machine learning and deep learning algorithms. The paper presents an empirical analysis of...
Gespeichert in:
Veröffentlicht in: | Indian journal of science and technology 2024-04, Vol.17 (15), p.1515-1526 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1526 |
---|---|
container_issue | 15 |
container_start_page | 1515 |
container_title | Indian journal of science and technology |
container_volume | 17 |
creator | Shimi, G Mahibha, C Jerin Thenmozhi, Durairaj |
description | Objectives: Language detection is the process of identifying a language associated with a text. The proposed system aims to detect the Dravidian language that is associated with the given text using different machine learning and deep learning algorithms. The paper presents an empirical analysis of the results obtained using the different models. It also aims to evaluate the performance of a language agnostic model for the purpose of language detection. Method: An empirical analysis of Dravidian language identification in social media text using machine learning and deep learning approaches with k-fold cross validation has been implemented. The identification of Dravidian languages, including Tamil, Malayalam, Tamil Code Mix, and Malayalam Code Mix, is performed using both machine learning (ML) and deep learning algorithms. The machine learning algorithms used for language detection are Naive Bayes (NB), Multinomial Logistic Regression (MLR), Support Vector Machine (SVM), and Random Forest (RF). The supervised Deep Learning (DL) models used include BERT, mBERT and language agnostic models. Findings: The language agnostic model outperform all other models considering the task of language detection in Dravidian languages. The results of both the ML and DL models are analyzed empirically with performance measures like accuracy, precision, recall, and f1-score. The accuracy associated with different machine learning algorithms varies from 85% to 89%. It is evident from the experimental result that the deep learning model outperformed with an accuracy of 98%. Novelty: The proposed system emphasizes on the use of the language agnostic model to implement the process of detecting Dravidian languages associated with the given text which provides a promising result of 98% accuracy which is higher than the existing methodologies. Keywords: Language, Machine learning, Deep learning, Transformer model, Encoder, Decoder |
doi_str_mv | 10.17485/IJST/v17i15.765 |
format | Article |
fullrecord | <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_17485_IJST_v17i15_765</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_17485_IJST_v17i15_765</sourcerecordid><originalsourceid>FETCH-LOGICAL-c835-3925fb1d293c7e94c9234ad91086cabad1eddc394ca5ed10b1265e17bde656663</originalsourceid><addsrcrecordid>eNo9kM1qAjEYRUNpoWLdd5kXGE0mf5PlVG21DHTR2YdvkoykjFESK_j2nVbp3dwLB-7iIPRMyZwqXonF9v2zXZypClTMlRR3aEK04oWQXNzftqy4fESznL_IGFZWRJEJeqkjXu-PIQULA64jDJccMj70uIG4-4adxyt_8vYUDhGHiFcJzsEFiP88P6GHHobsZ7eeovZ13S43RfPxtl3WTWErJgqmS9F31JWaWeU1t7pkHJympJIWOnDUO2fZCEB4R0lHSyk8VZ3zUkgp2RSR661Nh5yT780xhT2ki6HE_FkwvxbM1YIZLbAfYXFQiA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>An Empirical Analysis of Language Detection in Dravidian Languages</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Shimi, G ; Mahibha, C Jerin ; Thenmozhi, Durairaj</creator><creatorcontrib>Shimi, G ; Mahibha, C Jerin ; Thenmozhi, Durairaj ; Department of Computer Applications, Madras Christian College, Tambaram, Chennai, 600059, Tamil Nadu, India</creatorcontrib><description>Objectives: Language detection is the process of identifying a language associated with a text. The proposed system aims to detect the Dravidian language that is associated with the given text using different machine learning and deep learning algorithms. The paper presents an empirical analysis of the results obtained using the different models. It also aims to evaluate the performance of a language agnostic model for the purpose of language detection. Method: An empirical analysis of Dravidian language identification in social media text using machine learning and deep learning approaches with k-fold cross validation has been implemented. The identification of Dravidian languages, including Tamil, Malayalam, Tamil Code Mix, and Malayalam Code Mix, is performed using both machine learning (ML) and deep learning algorithms. The machine learning algorithms used for language detection are Naive Bayes (NB), Multinomial Logistic Regression (MLR), Support Vector Machine (SVM), and Random Forest (RF). The supervised Deep Learning (DL) models used include BERT, mBERT and language agnostic models. Findings: The language agnostic model outperform all other models considering the task of language detection in Dravidian languages. The results of both the ML and DL models are analyzed empirically with performance measures like accuracy, precision, recall, and f1-score. The accuracy associated with different machine learning algorithms varies from 85% to 89%. It is evident from the experimental result that the deep learning model outperformed with an accuracy of 98%. Novelty: The proposed system emphasizes on the use of the language agnostic model to implement the process of detecting Dravidian languages associated with the given text which provides a promising result of 98% accuracy which is higher than the existing methodologies. Keywords: Language, Machine learning, Deep learning, Transformer model, Encoder, Decoder</description><identifier>ISSN: 0974-6846</identifier><identifier>EISSN: 0974-5645</identifier><identifier>DOI: 10.17485/IJST/v17i15.765</identifier><language>eng</language><ispartof>Indian journal of science and technology, 2024-04, Vol.17 (15), p.1515-1526</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Shimi, G</creatorcontrib><creatorcontrib>Mahibha, C Jerin</creatorcontrib><creatorcontrib>Thenmozhi, Durairaj</creatorcontrib><creatorcontrib>Department of Computer Applications, Madras Christian College, Tambaram, Chennai, 600059, Tamil Nadu, India</creatorcontrib><title>An Empirical Analysis of Language Detection in Dravidian Languages</title><title>Indian journal of science and technology</title><description>Objectives: Language detection is the process of identifying a language associated with a text. The proposed system aims to detect the Dravidian language that is associated with the given text using different machine learning and deep learning algorithms. The paper presents an empirical analysis of the results obtained using the different models. It also aims to evaluate the performance of a language agnostic model for the purpose of language detection. Method: An empirical analysis of Dravidian language identification in social media text using machine learning and deep learning approaches with k-fold cross validation has been implemented. The identification of Dravidian languages, including Tamil, Malayalam, Tamil Code Mix, and Malayalam Code Mix, is performed using both machine learning (ML) and deep learning algorithms. The machine learning algorithms used for language detection are Naive Bayes (NB), Multinomial Logistic Regression (MLR), Support Vector Machine (SVM), and Random Forest (RF). The supervised Deep Learning (DL) models used include BERT, mBERT and language agnostic models. Findings: The language agnostic model outperform all other models considering the task of language detection in Dravidian languages. The results of both the ML and DL models are analyzed empirically with performance measures like accuracy, precision, recall, and f1-score. The accuracy associated with different machine learning algorithms varies from 85% to 89%. It is evident from the experimental result that the deep learning model outperformed with an accuracy of 98%. Novelty: The proposed system emphasizes on the use of the language agnostic model to implement the process of detecting Dravidian languages associated with the given text which provides a promising result of 98% accuracy which is higher than the existing methodologies. Keywords: Language, Machine learning, Deep learning, Transformer model, Encoder, Decoder</description><issn>0974-6846</issn><issn>0974-5645</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNo9kM1qAjEYRUNpoWLdd5kXGE0mf5PlVG21DHTR2YdvkoykjFESK_j2nVbp3dwLB-7iIPRMyZwqXonF9v2zXZypClTMlRR3aEK04oWQXNzftqy4fESznL_IGFZWRJEJeqkjXu-PIQULA64jDJccMj70uIG4-4adxyt_8vYUDhGHiFcJzsEFiP88P6GHHobsZ7eeovZ13S43RfPxtl3WTWErJgqmS9F31JWaWeU1t7pkHJympJIWOnDUO2fZCEB4R0lHSyk8VZ3zUkgp2RSR661Nh5yT780xhT2ki6HE_FkwvxbM1YIZLbAfYXFQiA</recordid><startdate>20240416</startdate><enddate>20240416</enddate><creator>Shimi, G</creator><creator>Mahibha, C Jerin</creator><creator>Thenmozhi, Durairaj</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20240416</creationdate><title>An Empirical Analysis of Language Detection in Dravidian Languages</title><author>Shimi, G ; Mahibha, C Jerin ; Thenmozhi, Durairaj</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c835-3925fb1d293c7e94c9234ad91086cabad1eddc394ca5ed10b1265e17bde656663</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Shimi, G</creatorcontrib><creatorcontrib>Mahibha, C Jerin</creatorcontrib><creatorcontrib>Thenmozhi, Durairaj</creatorcontrib><creatorcontrib>Department of Computer Applications, Madras Christian College, Tambaram, Chennai, 600059, Tamil Nadu, India</creatorcontrib><collection>CrossRef</collection><jtitle>Indian journal of science and technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shimi, G</au><au>Mahibha, C Jerin</au><au>Thenmozhi, Durairaj</au><aucorp>Department of Computer Applications, Madras Christian College, Tambaram, Chennai, 600059, Tamil Nadu, India</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An Empirical Analysis of Language Detection in Dravidian Languages</atitle><jtitle>Indian journal of science and technology</jtitle><date>2024-04-16</date><risdate>2024</risdate><volume>17</volume><issue>15</issue><spage>1515</spage><epage>1526</epage><pages>1515-1526</pages><issn>0974-6846</issn><eissn>0974-5645</eissn><abstract>Objectives: Language detection is the process of identifying a language associated with a text. The proposed system aims to detect the Dravidian language that is associated with the given text using different machine learning and deep learning algorithms. The paper presents an empirical analysis of the results obtained using the different models. It also aims to evaluate the performance of a language agnostic model for the purpose of language detection. Method: An empirical analysis of Dravidian language identification in social media text using machine learning and deep learning approaches with k-fold cross validation has been implemented. The identification of Dravidian languages, including Tamil, Malayalam, Tamil Code Mix, and Malayalam Code Mix, is performed using both machine learning (ML) and deep learning algorithms. The machine learning algorithms used for language detection are Naive Bayes (NB), Multinomial Logistic Regression (MLR), Support Vector Machine (SVM), and Random Forest (RF). The supervised Deep Learning (DL) models used include BERT, mBERT and language agnostic models. Findings: The language agnostic model outperform all other models considering the task of language detection in Dravidian languages. The results of both the ML and DL models are analyzed empirically with performance measures like accuracy, precision, recall, and f1-score. The accuracy associated with different machine learning algorithms varies from 85% to 89%. It is evident from the experimental result that the deep learning model outperformed with an accuracy of 98%. Novelty: The proposed system emphasizes on the use of the language agnostic model to implement the process of detecting Dravidian languages associated with the given text which provides a promising result of 98% accuracy which is higher than the existing methodologies. Keywords: Language, Machine learning, Deep learning, Transformer model, Encoder, Decoder</abstract><doi>10.17485/IJST/v17i15.765</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0974-6846 |
ispartof | Indian journal of science and technology, 2024-04, Vol.17 (15), p.1515-1526 |
issn | 0974-6846 0974-5645 |
language | eng |
recordid | cdi_crossref_primary_10_17485_IJST_v17i15_765 |
source | Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals |
title | An Empirical Analysis of Language Detection in Dravidian Languages |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T21%3A44%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20Empirical%20Analysis%20of%20Language%20Detection%20in%20Dravidian%20Languages&rft.jtitle=Indian%20journal%20of%20science%20and%20technology&rft.au=Shimi,%20G&rft.aucorp=Department%20of%20Computer%20Applications,%20Madras%20Christian%20College,%20Tambaram,%20Chennai,%20600059,%20Tamil%20Nadu,%20India&rft.date=2024-04-16&rft.volume=17&rft.issue=15&rft.spage=1515&rft.epage=1526&rft.pages=1515-1526&rft.issn=0974-6846&rft.eissn=0974-5645&rft_id=info:doi/10.17485/IJST/v17i15.765&rft_dat=%3Ccrossref%3E10_17485_IJST_v17i15_765%3C/crossref%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |